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Abstract 

Classic similarity measures of strings are longest common subsequence and Levenshtein dis¬ 
tance (i.e., the classic edit distance). A classic similarity measure of curves is dynamic time 
warping. These measures can be computed by simple 0(n 2 ) dynamic programming algorithms, 
and despite much effort no algorithms with significantly better running time are known. 

We prove that, even restricted to binary strings or one-dimensional curves, respectively, these 
measures do not have strongly subquadratic time algorithms, i.e., no algorithms with running 
time 0{n 2 ~ E ) for any e > 0, unless the Strong Exponential Time Hypothesis fails. We generalize 
the result to edit distance for arbitrary fixed costs of the four operations (deletion in one of the 
two strings, matching, substitution), by identifying trivial cases that can be solved in constant 
time, and proving quadratic-time hardness on binary strings for all other cost choices. This 
improves and generalizes the known hardness result for Levenshtein distance [Backurs, Indyk 
STOC’15] by the restriction to binary strings and the generalization to arbitrary costs, and adds 
important problems to a recent line of research showing conditional lower bounds for a growing 
number of quadratic time problems. 

As our main technical contribution, we introduce a framework for proving quadratic-time 
hardness of similarity measures. To apply the framework it suffices to construct a single gadget, 
which encapsulates all the expressive power necessary to emulate a reduction from satisfiability. 

Finally, we prove quadratic-time hardness for longest palindromic subsequence and longest 
tandem subsequence via reductions from longest common subsequence, showing that conditional 
lower bounds based on the Strong Exponential Time Hypothesis also apply to string problems 
that are not necessarily similarity measures. 


1 Introduction 

For many classic polynomial time problems the worst-case running time is stagnant for decades, 
e.g., a classic algorithm solves the problem in time t)(?r 2 ), up to logarithmic factors, but it is 
unknown whether any faster algorithms exist. For these problems we would like to explain why 
it is hard to find faster algorithms. One type of explanation is a conditional lower bound. Here 
we assume that some problem P has no algorithms faster than a long-standing time barrier and 
prove resulting lower bounds for other problems, via reductions from P. The most prominent 
such approach is 3SUM-hardness, which dates back to 1995 HD: Assuming that 3SUM has no 
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(strongly) subquadratic algorithms, many lower bounds have been shown, especially for problems 
in computational geometry. However, for many other problems it seems to be impossible to find a 
reduction from 3SUM. 

In the last years, new assumptions emerged that allow to prove conditional lower bounds for 
problems where 3SUM-hardness does not seem to apply. The prime example is the Strong Expo¬ 
nential Time Hypothesis (SETH), which was introduced by Impagliazzo and Paturi | [l3j and asserts 
that satisfiability has no algorithms that are much faster than exhaustive search. 

Hypothesis SETH: For no e > 0, fc-SAT can be solved in time 0(2 T e l N ) for all k > 3. 

Note that exhaustive search takes time 0(2 N ) and the best-known algorithms for fc-SAT have a 
running time of the form 0( 2( 1 ~ c / fc ) Af ) for some constant c > 0 |18| . Thus, SETH is a reasonable 
hypothesis and, due to lack of progress in the last decades, can be considered unlikely to fail. 

The idea to use SETH to prove conditional lower bounds for polynomial time problems dates 
back to 2005 [23], but only in recent years more and more such conditional lower bounds have been 
proven, see, e.g., m El E3 El 0 H3 [ 19] , Two recent examples, that motivated this paper, are the 
conditional lower bounds for Frechet distance [8] and Levenshtein distance |6j. Both problems are 
natural similarity measures between two sequences (curves or strings, respectively). In this paper we 
study additional classic similarity measures between strings and curves. We propose a framework for 
proving lower bounds for such similarity measures. This allows us to prove quadratic-time hardness 
of the following problems. 

Edit Distance Given two strings x,y of length n,m (n > m ), we start in their first symbols at 
positions (1,1) and traverse them up to their last symbols at positions (n, m) using the following 
operations: If we are at positions (i, j) we may (1) delete a symbol in x (this costs Cd e i- X and we 
advance to (i + 1 ,j)), (2) delete a symbol in y (this costs Cd e i-y and we advance to ( i,j + 1)), (3) 
match the current symbols, which is only possible if x[i] = y[j] (this costs c ma t c h and we advance 
to (i + 1, j + 1)), or (4) substitute the current symbols, which is only possible if x[i\ ^ y[j] (this 
costs c su bst and we advance to (i + l,j + 1)). The minimum total cost of such a sequence of 
operations is called the edit distance of x and y, and we denote the problem of computing the 
edit distance by Edit(cd e i- X , c del-y> C match) c subst)- The Levenshtein distance (i.e., the classic edit 
distance) is Edit(l, 1,0,1). An important special case is the longest common subsequence (LCS) of 
two strings, which can be seen to be equivalent to Edit(l, 1,0, 2). One obtains more variants for 
other cost choices, e.g., for aligning DNA sequences a classic choice is Edit(2, 2, —1,1) [22] . 

Edit distance has a natural dynamic programming algorithm with running time 0(nm), which is 
taught in many undergraduate algorithms courses. Since such string distance measures have many 
applications in bioinformatics and data comparison, Levenshtein distance and LCS are well-studied 
with a rich literature focussing on approximation algorithms (see, e.g., m) and algorithms that 
perform well on special cases (see, e.g., [ 12] and see [7] for a survey). However, the best-known 
worst-case running time (of an exact algorithm) is 0(nm/ logn + n) [16] . i.e., algorithms are stuck 
slightly below quadratic time. Even if we restrict the input to strings over a binary alphabet {0,1} 
no better worst-case running time is known. In this paper we present a possible explanation for this 
situation by proving conditional lower bounds for edit distance on binary strings, thus improving and 
generalizing the known quadratic-time hardness for the Levenshtein distance on alphabet size 4 [6], 
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Dynamic Time Warping (DTW) Fix a metric space ( M,d ). A sequence of points in M is 
called a curve. Consider two curves x,y of length n,m (n > m). We may traverse x and y by 
starting in their first entries, in any time step advancing to the next entry in x or y or both, and 
ending in their last entries (see Section [2] for details). The cost of such a traversal is the sum over 
all points in time of the distance between the current entries. The dynamic time warping distance 
of x and y is the minimal cost of any traversal. This similarity measure can, e.g., readily detect 
whether two given signals are equal up to time accelerations or decelerations. This property, among 
others, makes it a very useful measure in practice, with many applications in comparing temporal 
data such as video and audio, e.g., for speech recognition or music processing (see, e.g., [20]). The 
best-known worst-case running time is achieved by a simple dynamic programming algorithm that 
computes the DTW distance of x and y in time 0(nm). To break this apparent barrier in practice, 
many heuristics have been designed for this problem (see, e.g., |21j). 

An important special case that frequently arises in practice is dynamic time warping on one¬ 
dimensional ciLrves. Here the metric space is M = R and the distance measure is d(a, b ) := |a — b\ 
for any a, b € M. Even for this important special case the best-known algorithm takes time 0{nm). 
We provide a possible explanation for this situation by proving a conditional lower bound for DTW 
on one-dimensional curves. 

1.1 Our Results 

Dynamic Time Warping As our first main result, we prove a conditional lower bound for 
DTW. This shows that strongly subquadratic algorithms for DTW can be considered unlikely to 
exist. Specifically, obtaining such algorithms is at least as hard as a breakthrough for satisfiability. 

Theorem 1.1. DTW on one-dimensional curves taking values in {0,1,2,4,8} C M has no 0(n 2 ~ e ) 
algorithm for any e > 0, unless SETH fails. 

Edit Distance Our second main result is a classification of Edit(cdei-X) c de i- y , C match) c su bst) f° r 
all operation costs c de i_ x , Cdel-yj c ma t c h> c su bst : We identify trivial variants where the edit distance 
is independent of the input x, y, and only depends on n, m. In this case, it can be computed in 
constant time. For all remaining choices of the operation costs we prove quadratic-time hardness, 
even restricted to binary strings. This includes quadratic-time hardness of LCS and Levenshtein 
distance on binary strings. Compared to the known lower bound for Levenshtein distance [6], our 
result decreases the alphabet size from 4 to 2 and adds hardness of a large class of problems including 
LCS. 

Theorem 1.2. Edit(c de i_ x , c de] . y , c match , c subst ) can be solved in constant time if c suhs t = c matc h or 
Cdei-x T c d ei-y < i nii 11 jc 1Tla tch■ } ■ Otherwise, Edit(c de i-x; c de i_y, c ma t c h, c su bst) on binary strings has 

no 0{n?~ e ) algorithm for any e > 0, unless SETH fails. 

As first step of the hardness part of this theorem, for some 0 < d bst < 2 depending on 
Cdel-x; c de i_y, c ma tch) c S ubst we reduce Edit(l, 1, 0, c subs ^.) to Edit(c de i_ x , c de i_y, c ma tchj c sn bstj. This re¬ 
duction is what fails for the trivial cases. Then we prove hardness of Edit(l, 1,0, c' ubst ) using a 
construction that is parameterized by c( ubst . 

Unbalanced Inputs Our main results are most meaningful for inputs with n ~ rn. It is con¬ 
ceivable that for unbalanced inputs, i.e., m <C n, faster algorithms exist, say the running time of 
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0(nm) could be reduced to 0(n + m 2 ). For DTW we show that such an improvement is unlikely, 
by proving that “for any m” no algorithm with running time 0((nm) 1_£ ) exists, assuming SETH. 
This is analogous to the situation for Frechet distance [Sj. 


Theorem 1.3. Unless SETH fails, DTW on one-dimensional curves taking values in {0,1,2,4,8} 
has no 0((nm) 1 ~ £ ) algorithm for any e > 0, and this even holds restricted to instances with 

n «-o(i) < m < n “+°(i) f or arL y 0 < a < 1. 


For edit distance, Theorem 1.2 implies that there is no 0(m 2 £ ) algorithm for any e > 0 (in the 


worst case over all strings x, y with \x\ < n and |y| < m for any n > m). Our reduction from SETH 
cannot result in unbalanced strings, and thus we are not able to prove better lower bounds than 
0{m 2 ~ £ ). This behaviour hints at the possibility of an 0(n + m 2 ) algorithm for edit distance - and 
indeed there is an algorithm for LCS from ’77 due to Hirschberg [12] matching this time complexity. 
For completeness, we show that this algorithm can be generalized to edit distance. 


Theorem 1.4. Edit(cdei-x> Cdei-y, c ma t c h; Csubst) has an 0(n + m 2 ) algorithm. 


Thus, for unbalanced inputs DTW and edit distance differ in their behaviour, but using SETH 
we can readily explain this difference. 


Reductions from Longest Common Subsequence Note that any near-linear time reduction 
from LCS to another problem P transfers the quadratic-time lower bound of LCS to P. We think 
that this notion of LCS-hardness could be used to prove lower bounds for many string problems 
(not only distance measures). To support this claim, we present two easy results in this direction. 

A palindromic subsequence (also called symmetric subsequence) of a string x of length re is a 
subsequence 2 that is the same as its reverse rev(z). Computing a longest palindromic subsequence 
is a popular exercise in undergraduate text books (e.g., [0] Exercise 15-2]), since it can be easily 
solved by a reduction to LCS or adapting the dynamic programming solution of LCS, both resulting 
in an 0(n 2 ) algorithm. A tandem subsequence of a string £ is a subsequence z that can be written as 
the concatenation z = yy of a string y with itself. In contrast to longest palindromic subsequence, it 
is noil-trivial to compute a longest tandem subsequence in time 0(n 2 ) |15| . We present reductions 
from LCS to both of these problems, which yields the following lower bounds. 

Theorem 1.5. On binary strings, longest palindromic subsequence and longest tandem subsequence 
have no (D(n 2 ~ £ ) algorithms for any e > 0, unless SETH fails. 

These results show that SETH-based lower bounds via LCS are applicable to string problems 
that are not necessarily similarity measures. 

1.2 Technical Contribution 

We introduce a framework for proving SETH-based lower bounds for similarity measures. It is 
based on a construction that we call alignment gadget. Given instances x\,... , x n and y\,..., y m , 
m < re, an alignment gadget consists of two instances x,y whose similarity 5(x,y) is closely related 
to Yl(i j)eA h(xi,yj), where A = {(q, 1),..., (i m , m)} is the best-possible ordered alignment of the 
numbers in [rre] to [re] (for details see Section [3]). We prove a quadratic lower bound for any similarity 
measure admitting an alignment gadget. This proof is a simplified version of a construction in the 
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known lower bound for Levenshtein distance [6], which is also closely related to the lower bound for 
Frechet distance [8j. 

Working with our framework has two advantages: First, it unifies three constructions that are 
separate proof steps in other SETH-based lower bounds 0Ej, thus reducing the amount of work 
necessary to prove SETH-based lower bounds. Second, it hides the reduction from satisfiability, 
providing a level of abstraction that allows to ignore the details of the satisfiability problem and 
instead focus on the details of the problem we reduce to. This makes it possible to tackle general 
problems such as Edit(c<i e i_ x , Cd e i- y , c ma t c h, c S ubst)> where the reduction depends on parameters of the 
problem, without resulting in an overly complex proof. 

We present alignment gadgets for edit distance and dynamic time warping. This part needs 
careful problem-specific constructions. In particular, we have to construct instances where the 
optimal sequence of edit distance operations has some exploitable structure, which is made difficult 
by the fact that we work over binary alphabet, so that in principle any two zeroes and any two ones 
can be matched. 

1.3 Related Work 

Independently of our work, similar lower bounds for LCS and DTW have been shown by Abboud 
et al. [T]. Let us briefly compare our approaches. Our main technical contribution is the alignment- 
framework, which allows us to give shorter hardness proofs. The proofs of Abboud et al. are 
longer, in particular since they are using the lower bound for Levenshtein distance [6], while our 
proofs are self-contained. The main technical contribution of Abboud et al., apart from careful 
reductions, seems to be that they reduce from a novel problem that they call Most-Orthogonal 
Vectors. Regarding the problem LCS, our hardness result is stronger, since we show hardness on 
binary strings, while Abboud et al. need alphabet size 7. Regarding DTW, we prove hardness 
of different special cases, as we consider DTW on one-dimensional curves over alphabets of size 
5 (where the distance of two numbers is their absolute difference), while Abboud et al. consider 
DTW on strings over alphabets of size 5 (where the distance of two symbols is 1 or 0, depending 
on whether they are equal or not). On top of these core results, Abboud et al. generalize their 
result for LCS to fc-LCS, the longest common subsequence of k strings. We classify the complexity 
of edit distance for arbitrary operation costs and prove hardness of additional string problems via 
reductions from LCS. 

1.4 Organization 

In Section [2] we fix notation and discuss alternative assumptions to SETH that can be used to prove 
our results. We present our framework for obtaining quadratic lower bounds in Section [3] We then 
first prove a conditional lower bound for LCS in Section [dj this proof is superseded by the conditional 
lower bound for edit distance in Section [5] but it is shorter and might be more accessible. Quadratic¬ 
time hardness of dynamic time warping follows in Section [6] Finally, in Section [7] we prove hardness 
of longest palindromic subsequence and longest tandem subsequence. 

2 Preliminaries 

For a sequence x, we write |x| for its length, x[k\ for its k-th entry, x[k..£] for the substring from 
x[k\ to x[£]i and rev(x) for the reversed sequence. For sequences x, y we denote their concate- 
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nation by xy. A traversal of two sequences x,y of length n,m, respectively, is a sequence of 
pairs ... ,(at,bt)) with t G N satisfying (1) (ai,6i) = (1,1), (2) (a t ,b t ) = (n,m ), and (3) 

(a*+i, 6*+i) is either of (a* + 1, bi), (a*, bi + 1), or (a* + 1,6* + 1) for all 1 < i < t. 


Edit Distance Let x, y be strings over an alphabet E of length n,m (n > m), respectively. For a 
traversal T = ((ai, 6i),..., (at, bt )) of x, y we say that its z-th operation, 1 < i < t, is (1) a deletion 
in x if (aj+i, 6j + i) = (a* + 1,6*), (2) a deletion in y if (at+i, 6*_|_i) = (a*, 6* + 1), (3) a matching if 
(aj+i, 6j + i) = (a* + 1,6* + 1) and x [a*] = z/[6*], or (4) a substitution if (a i+ i, 6*+i) = (a* + 1, 6* + 1) 
and x\at] / y\bi\- These four operations incur costs of Cdei-x, Cdel-y, c m atch, and c su bst, respectively. 
We will always assume that these costs are rational constants, so that we can ignore representation 
issues. The cost 6Edit(T) of a traversal T is the total cost of all its operations. The edit distance 
^Edit (+ V) is the minimal cost of any traversal of x,y. We write Edit(cd e i- X , Cd e i- y , c matc h, c su b s t) for 
the problem of computing the edit distance of two given strings with costs Cdei-x, Cdel-y, c ma t c h, and 
c subst- We write Edit(c su b s t) as a shorthand for Edit(l, 1,0, c su b s t)- Note that for these problems 
the costs of all four operations are constant, i.e., they stay fixed with growing n, m. We will mostly 
consider edit distance over binary strings, i.e., we set E = {0,1}. 

Dynamic Time Warping (DTW) Let ( M,d ) be any metric space. Let x,y be curves, i.e., 
sequences over M of length n, rri (n > m ), respectively. The cost <5 dtw(T) of a traversal T = 
((ai, 6i),..., (at, bt)) is Yl\=i d(x[aj], y[&*]). The dynamic time warping distance <5 dtw(+ y) is the 
minimal cost of any traversal of x and y. We obtain the special case of dynamic time warping on 
one-dimensional curves by setting M = R and d(a, 6) := | a — b\ for any a, b E M. 

2.1 Hardness Assumptions 

Consider the Orthogonal Vectors problem (OV): Given sets A, B of vectors in {0, l} d , |A| = n, \B\ = 
m, decide whether there is a pair of vectors a E A, b E B such that a[fc] ■ b[k] =0 for all k (which 
we denote by (a, 6) = 0). Clearly, this problem can be solved in time 0(n 2 d). The best-known 
algorithm runs in time which is only slightly subquadratic for d logn. 

Thus, the following hypotheses are reasonable. 

Orthogonal Vectors Hypothesis (OVH): For no £ > 0 there is an algorithm for OV, restricted 
to n = m, that runs in time 0(n 2 ~ £ po\y(d)). 

Unbalanced Orthogonal Vectors Hypothesis (UOVH): Let 0 < a < 1. For no e > 0 there 
is an algorithm for OV, restricted to m = 0(n“) and d < n°^\ that runs in time 0((mn) l ~ £ ). 

It is well-known that SETH implies OVH [23]. A slight generalization shows that SETH also 
implies UOVH. Hence, these hypotheses are weaker assumptions than SETH. 

Lemma 2.1. SETH implies OVH and UOVH. 

Proof. For OVH the statement follows from |23j . Let 0 < e < 1/2 and 0 < a < 1. Assume that 
Orthogonal Vectors, restricted to m = 0(n a ) and d < n 0 ^ , has an 0((nm) l ~ e ) algorithm. We 
show that this contradicts SETH. To this end, let ip be an instance of /c-SAT with N variables 
and M clauses. We use the sparsification lemma m, which yields t := 2 eN ! 2 k-SAT instances 
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ipx,... ,ipt with N variables and f(k ,£) • N clauses such that <p is satisfiable if and only if some 
ipi is satisfiable. If IV < f(k,e) then we decide each pi in time Ok )£ ( 1). Otherwise, ipi has at 
most N 2 clauses, and we can assume equality by duplicating clauses. In this case, we construct 
an instance of Orthogonal Vectors as follows. Let xi,, xn be the variables and Ci ,..., Cjv 2 be 
the clauses of p^. We set d := N 2 and split the variables into the left half x±,..., Xn/(i+o) and 
the right half aJjv/(i+a)+i> ■ • • i x n- The set A consists of one vector a z G R. for every assignment 
z of true and false to the left half of the variables. If z causes clause Ci to be true, i.e., some 
unnegated variable of Ci is set to true in z or some negated variable of Ci is set to false in z, 
then we set a z [i\ : = 0. Otherwise, we set a z [i] := 1. Similarly, set B has a vector b z > for any 
assignment z' of true or false to the right half of the variables and b z /[i] = 0 or 1, depending on 
whether z' causes clause Ci to be true. Then ( a z ,b z >} = 0 if and only if ( z,z') forms a satisfying 
assignment of pi. Thus, we can decide pi by solving the constructed instance of Orthogonal Vectors. 
Note that n = \A\ = 2 Ar /( 1+ ") and m = \B\ = 2 Na /^ l+a \ so that indeed m = 0(n“). Moreover, 
d = N 2 < 2°^) = n°T). Thus, we can apply the algorithm for Orthogonal Vectors, that we assumed 
to exist, running in time 0((nm) 1 ~ £ ) = 0( 2( 1_e ) Ar ). Running this procedure for all pi decides p in 
time 0(t ■ 2( 1 ~ £ ) Ar ) = Q( 2( 1 - £ / 2 ) Ar ) ; contradicting SETH. □ 


Thus, any lower bound conditional on OVH or UOVH also holds conditional on SETH. In fact, 
we prove all of our results by reductions from Orthogonal Vectors, so that in our results we may 
replace the assumption SETH by OVH or UOVH. Specifically, in Theorems [MJ0 and|1.5| 


we can 


replace SETH by OVH, and in Theorem |1.3| we can replace SETH by UOVH. We remark that a 
version of OVH has also been used in [lj and is implicit in many other SETH-based lower bounds. 


3 Framework 

We consider a similarity (or distance) measure 6 : XxX —> No, where X denotes the set of inputs, e.g., 
all binary strings or all one-dimensional curves. By a reduction from Orthogonal Vectors, we prove 
that computing this similarity measure cannot be done in strongly subquadratic time unless SETH 
fails if 5 admits a gadget that allows us to exactly realize alignments of inputs x\,... ,x n 6l and 
yi,... ,y m G X. To formally state the requirement, we start by introducing the following notions. 

Types In this paper, we define the type of a sequence x G X to be its length and the sum of 
its entries, i.e., type(x) := (M, Yli X V\) (where for binary strings i s to be interpreted as 

the number of ones in x). The definition of types can be customized to the similarity measure 
under consideration and is chosen to work for the problems considered in this paper. We define 
X t := {x G X | type(x) = t} as the set of inputs of type t. 

Alignments Let n > m. A (partial) alignment is a set A = {(H, ji),..., {ikClk)} with 0 < k < m 
such that 1 < i\ < ... < ik < n and 1 < j\ < ... < jk C fn. We say that (i,j) G A are aligned. 
Any i G [n] or j G [m] that is not contained in any pair in A is called unaligned. We denote the set 
of all partial alignments (with respect to n, m) by A n ^ m . 

We call the partial alignment {(A + 1,1),..., (A + m, m)}, with 0 < A < n — m, a structured 
alignment. We denote the set of all structured alignments by 5 n ,m- 
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(b) Cost <5(A) of a structured alignment A G S n , m 


Figure 1: Costs of Alignments 


For any x±,... ,x n G Z and yi, ■.., y m G X we define the cost of alignment A G A n . rn by 
<5 (^ 4 ) = s yt;:::;yZ( A ) : = Y y j) + ( m -1^1) max^,%•). 

(ij)eA l,J 

In other words, for any j G \m\ which is aligned to some i we pay the distance 6(xi,yj), while for 
any unaligned j we pay the maximal distance of any (xj/, yy) (note that there are m—\A\ unaligned 
j G [m], see Figure [l]). This means that we get punished for any unaligned j. 


Alignment Gadget We start with some intuition. Consider the problem of computing the value 
min,i & s nm <5(A). This can be solved in time 0(nm) if each 5{xi,yj ) can be evaluated in constant 
time, since = 0(n) and evaluating (5(A) amounts to computing m values S(xi,yj). Moreover, 

intuitively it should not be possible to compute this value in strongly subquadratic time. We will 
show that in some sense it is even hard to compute, in strongly subquadratic time, any value v with 

min <5(A) <v< min (5(A). (1) 

AGS n,m 

Now, an alignment gadget is simply a pair of instances (x,y) such that from 8(x,y) we can infeiQa 
value v as above. The main reason to relax our goal from computing min J 4 £ 5 rim (5(A) to satisfying 
([I]) is that this makes constructing alignment gadgets much easier. Note that for the alignment 
gadget (x,y) computing 5(x,y) is as hard as computing min^g^ m <5(A) (in an approximate sense 
as given by 0). which we argued above should take quadratic time. This informal discussion 
motivates the following definition. 

Definition 3.1. The similarity measure 5 admits an alignment gadget, if the following conditions 
hold: Given instances x\,...,x n G Zt x , yi,..., y m G lt Y with m < n and types t x = (£ x , s x ), t Y = 
(£ y , s v ), we can construct new instances x = GA^ 1 ’ Y (x\, ..., x n ) and y = GAy’^yi,..., y m ) and 
CgZ such that 


min (5(A) < 5(x,y) — C < min (5(A). (2) 

A{zSn,m 

Moreover, type(x) and type(y) only depend on n,m,t. x , and t Y . Finally, this construction runs in 
time 0((n + m)(£ x + iy)). 

If the construction additionally fulhlls |x| = 0(n(£ x + £y)) and \y\ = 0(m(£ x + £y)), then we 
say that <5 admits an unbalanced alignment gadget. 


1 For us “infer” will simply mean that v = 5(x,y ) — C for an appropriate C. 




Note that the types serve the purpose of simplifying the algorithmic problem in the above 
definition by restricting the inputs to same-type objects. If we can construct suitable x and y for 
arbitrary inputs x\,.. . ,x n and y\, ..., y m then we may completely disregard types. 

Definition 3.2. The similarity measure 5 admits coordinate values , if there exist 0 X , 0 V , l x , 1 Y Gl 
satisfying 

1y) > <5(0xj 1 y) = ^(O.X) Oy) = ^(l.X) 0 y)> 
and moreover, type(O x ) = type(l x ) and type(0 Y ) = type(l Y ). 

Theorem 3.3. Let 5 be a similarity measure admitting an alignment gadget and coordinate values 
and consider the problem of computing 5{x,y) with \x\ < n, \y\ < m, and m. < n. For no e > 0 this 
problem can be solved in time unless OVH fails. If 5 even admits an unbalanced alignment 

gadget, then for no £ > 0 this problem can be solved in time 0((n?n) 1_£ ), unless UOVH fails. Both 
statements hold restricted to <m< n a+0 ^^ for any 0 < a < 1. 


3.1 Proof of Theorem 13.31 


We present a reduction from OV to the problem of computing 5. This uses constructions and 
arguments similar to [HI IS]- Consider an instance a±,... ,a n E {0,l} d and b\,...,b m E {0, l} rf of 
OV, n > m. We construct x,y € I and p E No such that 5(x, y) < p if and only if there are i E [n] 
and j E \m] with (a*, bj ) = 0. To this end, let a t [k] denote the k -th component of a t . For all i E [n] 
and j E [m], we construct coordinate gadgets as follows 


CG(oi,fc) 


CG (bj,k) 


0 x if a{[k] = 0 
l x if a t [k] = 1 


1 < k < d, 


0 Y if bj\k\ = 0 

n 1 1 < k < d. 

1 Y if bj [k] = 1 


CG(aj, d + 1) 0 x , 


CG(6j,d+l) := 1 Y . 


Note that we have type(CG(ai, 1)) = • • • = type(CG(aj, d + 1)) =: t x and type(CG(6j, 1)) = • • • = 
type(CG (bj,d + 1)) =: t Y by definition of coordinate values. This allows us to use the alignment 
gadget to obtain the following vector gadgets 


VG(oi) := GA x +1,tY (CG(a;, 1),..., CG(a;, d + 1)), 
VG(6 i ) := GA Y +1,tx (CG(6j, 1),..., CG(bj, d + 1)), 
5 := GA^ +1,tY (O x ,..., 0 X , l x ), 

S -v- y 

d+1 


Note that type(VG(oi)) = ... = type(VG(a n )) = type (S') =: t' x and type(VG(6i)) = ... = 
type(VG(6 m )) =: t' Y , because the type of the output of the alignment gadget only depends on the 
number of input elements and their type, which are all t x or all t Y , respectively. We introduce 
normalized vector gadgets as follows 


NVG(ai) := GA^ Y (5, VG(a*)), 
NVG(6j) := GA y 4 (VG (bj)). 
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(a) Case (oj, bj) = 0. Aligning VG (bj) with VG(aj) 
achieves an alignment cost of (d + l)po- 



(b) Case ( ai,bj ) > 0. Aligning VG (bj) with S 
achieves an alignment cost of dpo + p\. 


Figure 2: Schematic illustration of the coordinate, vector, and normalized vector gadgets. 


Note that we have type(NVG(ai)) = ... = type(NVG(a n )) =: t x and type(NVG(&i)) = ... = 
type(NVG(6 m )) =: t”. We finally obtain x and y by setting 

x := GA^ t "(NVG(ai),...,NVG(a n ),NVG(ai),...,NVG(a n )), 
y ■= GA^"(NVG(6 1 ),...,NVG(6 m )). 


We denote by C. C ', C" the value C in the three invocations of Property ([2]) of the alignment gadget. 

Observe that x and y have length 0((n + m)d) and can be constructed in time <D((n + m)d) by 
applying the algorithm implicit in Definition |3.1| three times. Moreover, if 5 admits an unbalanced 
alignment gadget, then we have |x| = 0(nd ) and \y\ = 0(md). It remains to show that if we 
know 5(x,y) then we can decide the given OV instance in constant time, i.e., correctness of our 
construction, which we do below. This finishes our reduction from OV to the problem of computing 

let 0 < a < 1 and assume that 5(x , ,y') can be computed in time 


3.3 


5. To obtain Theorem 

0(M 2 ~ e ) whenever \x'\ < N , \y'\ < M, and N a ~°W < M < N a+ °^\ Then in particular for 
n = in we can compute 5{x,y) in time C?(min{|x|, |y|} 2-s + |x| + |y|) = 0(((n + m)d) 2 ~ e ) = 
(D((nd) 2 ~ £ ), contradicting OVH. In case of an unbalanced alignment gadget, assume that 6(x', y') 
can be computed in time 0((NM ) 1 ^ e ) whenever \x’\ < N, \y'\ < M, and N a ~°( 1 ) < M < N a+ °( lS) . 
Then for m = 0(n“) and d < we can compute 5(x,y) in time 0((|x||y|) 1_£ + |x| + 
0(((nd)(md)) 1 ~ £ + (■n + m)d ) = ^((nm) 1 ^^ 2 ), contradicting UOVH. This proves Theor 


rem 


3.3 


Correctness We now prove correctness of our construction and refer to Figure [2] for an intuition 
for coordinate, vector, and normalized vector gadgets. Let po := <5(0 X ,0 Y ) = <5(0 X ,1 Y ) = <5(1 X ,0 Y ) 
and pi := <5(l x , 1 Y ). Recall that po < Pi- 

Claim 3.4. For any i E [n], j E [m], if ( ai,bj) = 0, then J(VG(aj),VG(6j)) = C + (d + l)po- 
Otherwise, 5(VG(aj), VG(6j)) > C + dpo + pi- Moreover, 5(S,YG(bj)) = C + dpo + pi. 

Proof. If ( ai,bj} = 0, then the structured alignment {(1,1),..., (rf H- 1 ,d + 1)} has cost 5(A) = 
Ylt=\5(CG( a i,k),CG(bj,k)) = (d + l)po, since in each component k at least one value is 0 X or 
0 Y , incurring a cost of po (indeed even in position k = d + 1 we have CG (ai,d + 1) = 0 X ). By 
definition of alignment gadgets, we obtain d(VG(aj), VG(6j)) — C < (d + l)po- Moreover, since the 
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cost 5(A) of any alignment A G Ad+id+l consists of d + 1 summands of the form 5(u x ,u y ) with 
u x G {0 X , l x }, Uy G {0 y , 1 y }, we also have <5(VG(aj), VG(bj)) — C > (d + l)po- 

If (oj, bj) > 0, then consider any A € A n . m . If |4| = d + 1 then A = {(1,1),..., (d + 1, d + 1)}, 
and this alignment incurs a cost of at least dpo + pi, since in at least one position k we have 
CG(a*, k ) = l x and CG (bj, k) = 1 Y . Otherwise, if |4| < d+ 1, then d(4) consists of d+1 summands 
of the form 5(u x ,u y ) with u x G {0 x ,1 x },^y G {0 Y , 1 Y }, and at least one of these summands is 
the punishment term maxk/5(CG(ai,k),CG(bj,£)) because |M| < d + 1. Since ( a,i,bj ) = 1, the 
punishment term is p\ and we obtain 5(A) > dpo + p\. By definition of alignment gadgets, we have 
5(VG(oi),VG(bj)) -C>d Po + pi. 

We argue similarly for 5(S,VG(bj)): The alignment {(1,1),..., (d + 1, d + 1)} incurs a cost of 
dpo + P\ • since the (d + l)-th component of S is l x and of VG (bj) is VG (bj,d + 1) = 1 Y , while all 
other components of S are 0 X . Moreover, any alignment with |M| < d+1 incurs a punishment term, 
so that it incurs cost of at least dpo + pi. □ 


Claim 3.5. For any i S [n], j G [m], if (at, bj) = 0 then <5(NVG(aj), NVG(6j)) = C+C' + (d+l)po =: 
Pq. Otherwise, £(NVG(aj),NVG(frj)) = C + C' + dpo + p\ =: p}. 


Proof. Note that {(1,1)}, {(2,1)}, and 0 are the only alignments in *42,1, which corresponds to align¬ 
ing (S,YG(bj)) or (VG(aj), \G(bj)) or nothing. Moreover, the structured alignments are {(1,1)} 
and {(2,1)}. Observe that the cost of the alignment 0 is simply the maximum of the other two 
alignments. By Claim 3.4 if (a*, bj) = 0 then the minimal cost is C + (d + l)po, attained by align¬ 
ment {(2,1)}. Otherwise, the minimal cost is C + dpo + pi, attained by alignment {(1,1)}. By 
definition of alignment gadgets, this yields that <5(NVG(aj), NVG(fej)) — C' is equal to C +(d+ l)po 
or C + dpo + pi, respectively. □ 


The claim shows that <5(NVG(aj), NVG(6j)) attains one of only two values, depending on whether 
(ai,bj) = 0. 

Claim 3.6. If there is no i G [n],j G [to] with ( ai,bj) = 0, then 5(x,y) > C" + rnp\. Otherwise, 
5(x, y) < C" + (to. - 1 )p[ + p'q. 

Proof. If (ai,bj) > 0 for all i,j, then by the previous claim we have <5(NVG(aj),NVG(fej)) > for 
all i,j. Since the cost of any alignment consists of to summands of the form <5(NVG(aj), NVG(6j)) 
for some i, j , the cost of any alignment is at least rnp\. By definition of alignment gadgets, we 
obtain 5(x, y) — C" > mp' v 

If (aj., bj) = 0 for some i,j, then consider the structured alignment A = {(A + 1,1),..., (A + 
to, to)} with A := i — j if i > j, or A := n + i — j if i < j. Its cost consists of to summands, of which 
one is <5(NVG(aj),NVG(6j)) = p' 0 and all others are at most p{. Hence, the cost of A is at most 
(to — 1 )pi + p' 0 and by definition of alignment gadgets, we obtain 5(x, y) — C" < (m— l)p[ + p' 0 - □ 


By setting p := C" + (m — 1 )p{ + pg we have found a threshold such that 5(x,y) < p if and 
only if there is a pair (i,j) with (aj, bj) = 0. Thus, computing 5(x, y) allows to decide the given OV 


instance. This finishes the proof of Theorem 3.3 


4 Longest Common Subsequence 

In this section, we present an alternative hardness proof for longest common subsequence (LCS), 
which is shorter than for the more general problem Edit(cd e i- X , Cdel-y, c ma tch, Csubst) hr Section [5j 
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Given two strings x, y over an alphabet £, a longest common subsequence is a binary string z that 
appears in x and in y as a subsequence and has maximal length. We denote by LCS(x, y) some 
longest common subsequence of x and y. and by |LCS(x,y)| the length of any longest common 
subsequence of x and y. 

We present an alignment gadget and coordinate values for LCS over binary strings, i.e., we 
consider the set of inputs X := Ufc>o{^’ l} fc - Note that LCS is a maximization problem, but 
Definition |3.1| implicitly assumes a minimization problem, so we instead consider the number of 
unmatched symbols ^LCS^, 2 /) := \ x \ + \y\ ~ 2|LCS(x, y)\ for binary strings x,y. Note that this is 
equivalent to Edit(c<4 e l-x, Cdel-y, C ma tch, Csubst) for Qjel-x — djel-y — ^■ C ma tch — 0, and C su bst — 2 . 

Lemma 4.1. LCS admits coordinate values by setting 

l x := 11100, 0 X := 10011, 1 Y := 00111, 0 Y := 11001. 

Proof. All four values have the same length and the same number of Is, so they have equal type. 
Short calculations show that LCS(1 X , 1 Y ) = 111, LCS(1 X ,0 Y ) = 1100, LCS(0 X , 1 Y ) = 0011, and 
LCS(0 x ,0 Y ) = 1001. Thus, 4 = <5lcs(1x,1y) > 4cs(lx,0 Y ) = <5lcs(0 x ,1y) = 4cs(0 x ,l Y ) = 

2. n 


Definition 4.2. Consider instances x\,... ,x n E Lt x and J/i,... ,y m E Xt Y with n > m and types 
4 = (4,s x ),i Y = (4, s Y ). Set 7 i := 4 + 4,72 := 6(4 + 4), 73 := 10(4 + 4) + 2s x - 4,74 := 
13(4 + 4)- We guard the input strings by blocks of zeroes and ones, setting G(z) := l 72 0 71 z0 71 1 72 . 
We define the alignment gagdet as 


x := 


G(si) 0 73 G(x 2 ) 0 73 ... G(x n _i) 0 73 G(x n ), 
y := (4 74 G(yr) 0 73 C(y 2 ) 0 73 ... G(y m _i) 0 73 G (y m ) 0 " 7 + 

Lemma 4.3. Definition ^ -3\ realizes an alignment gadget for LCS. 

is applicable, implying a lower bound of 0(m 2 ~ £ ) for LCS. We remark 


Thus, Theorem 


3.3 


that our construction is no imbalanced alignment gadget, as the length of y grows linearly in n, 
not necessarily in m < n. Thus, we do not obtain a conditional lower bound of ^((nm) 1 ' 6 ) (for 
m ~ n a for any 0 < a < 1 ). 


Proof of Lemma 4-3 Observe that indeed x only depends on m,t Y , and x±,... ,x n , and type(x) 
only depends on n,m,t x , and t Y , and similarly for y. Moreover, x and y can clearly be constructed 
in time 0((n + m)( 4 + 4)), where 4 = |a+| = ■ • ■ = \x n \ and i Y = \y±\ = ... = \y m \- 
It remains to prove that by setting C := 27174 we have 


min (5(A) < 5(x,y) — C < min (5(A). (3) 

A.£.An,m AGSn.m. 

We first prove the following three useful observations. Here for a string z and indices a < b we 
denote the substring from z[a\ to z[b\ by z[a..b]. 

Claim 4.4. Let x and z \,..., Zk be binary strings. Set z = z\ ... z n . Then we have 

k 

4cs (x,z)= min V S LC s(x(zj), zfi), 

x(zi),...,x(z k ) ' 

J=1 

where x(z 1 ),... ,x(zk) range over all ordered partitions of x into k substrings, i.e., x{zfi) = x[io + 
l..ii],x(z 2 ) = x[h + l..* 2 ], - - - ,x(zk) = x[i k - 1 + 1 ..ik\ for any 0 = i 0 < h < ... < i k = 41- 
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Proof. For any ordered partition, the substrings x(zj) are disjoint and ordered along x, so we can 
take the longest common subsequences of (x(zj), Zj), j £ [A;], and concatenate them to form a 
common subsequence of (x,z). This shows |LCS(x, 2 )| > X^=i |LCS(x(,Zj), Zj)\. Since furthermore 

W = T/j =i I x{zj)\ and \z\ = Y,j=i \ z j I we obtain <5 lcsO,2) < Y,j=i <^lcs (x(zj),Zj). 

Now consider a longest common subsequence s of (x,y), which can be seen as a matching of 
symbols in x and y. Let Jj be the indices in x that are matched to symbols in Zj by s. Note that 
Ylj =i \ Jj\ = |LCS(x,y)|, as any matched symbol in x is matched to some Zj. Also note that the 
matching is ordered, meaning that for any i £ Jj and i' £ Jj' with j < j' we have i < i!. This allows 
to find an ordered partition x(z\),... ,x(zk) of x such that x(zj) contains the indices Jj for any j. 
Finally, for this partition we have LCS(x(£j), Zj) > \Jj\ so that Si,cs(x(zj), Zj) < \x(zj)\ + \zj\ — 2\Jj\. 
Summing up over j, we obtain J2j=i $LCs( x {zj), Zj) < \x\ + \z\ — 2|LCS(x, z)\ = 5 lcs(x, z). Together 
both halves of the proof imply the desired statement. □ 

Claim 4.5. Let z, w be binary strings and £, k £ No- Then we have (1) l k iv) = 5i,cs(z,w) 

and ( 2 ) <5lcs(0 £ £, l k w) > min{&:, <5 lcs(Z) Symmetrically, we have ( 2 ’) 5LCs(0 fc £, l l w) > 

minjfc, <5LCs(0 fc z, ic)}, and we obtain more symmetric statements by reversing all involved strings. 

Proof. (1) It suffices to show the claim for k = 1, then the general statement follows by induction. 
Consider a LCS s of (1 z, lw). At least one ’1’ is matched in s, as otherwise we can extend s by 
matching both ’l’s. If exactly one ’1’ is matched in s, then the other ’1’ is free, so we may instead 
match the two ’l’s. Thus, without loss of generality a LCS of (1 z, lw) matches the two ’l’s. This 
yields |LCS(lz, lu;)| = 1 + |LCS(z, w)\. Hence, <5 lcs(1a lw) = \lz\ + |lu;| — 2|LCS(lz, lw)| = 
\z\ + H - 2|LCS(z, io)| = <5 L cs(a,uO. 

(2) Fix a LCS s of (0 ^z, 1 k w). If s starts with a 0, then it does not contain the leading l k 
of the second argument, leaving at least k symbols unmatched, so that 1 k w) > k. Oth¬ 

erwise, if s starts with a 1 , then it does not contain the leading 0 ^ of the first argument, so that 
|LCS(0 £ z, l fc tc)| = |LCS(z, l fc u>)|. Then we have £lcs( 0^5 l k w) = | 0 ^ 2 ;| + |l fc u;| — 2 |LCS( 0 ^ 2 :, l fc tc)| > 
\z\ + |l fc w| - 2|LCS(^, l k w)\ = S LC s{z, 1 k w). □ 

Claim 4.6. Let £ > 0. For any prefix x' of x we have 5 lcs(*' c/ i 0^) > l. Moreover, if x' is of the form 
G(xi)0 73 .. .G(xj)0 73 for some 0 < i < n and £ > i ■ (272 + s x ), then 5 lcs(* 7;/ i 0 f ) = £. Symmetric 
statements hold for any suffix of x. 

Proof. We first show that for any i 6 [n] the string G(xi)0 73 contains as many ones as zeroes, and 
any prefix of G(xj)0 73 contains at least as many ones as zeroes. To this end, note that each x t has 
length £ x and contains s x ones, so that the number of ones of G(xj)0 73 is 272 + s x , while the number 
of zeroes is l x — s x + 271 + 73 , and we chose 73 such that both values are equal. For a prefix, note 
that G (X{) starts with 72 ones. Since each G(xj) contains 271 + £ x ~ s x < 72 zeroes, any prefix 
of G(xj) has as most as many zeroes as ones. Thus, we would have to advance to 0 73 to see more 
zeroes than ones, however, even G(xj)0 73 does not contain more zeroes than ones. 

Hence, any prefix x' of x contains at least as many ones as zeroes, implying |LCS(x / , (/)| < |x / |/ 2 . 
This yields ^lcsO^ 7 ) 0 £ ) = \ x '\ + |0 f | — 2|LCS(x / , 0^)| > £. If x' is of the form G(xi)0 73 ... G(xj)0 73 
and sufficiently many zeroes are available in then we have equality. □ 

Let us give names to the substrings consisting only of zeroes in x and y. In x, we denote the 
0 73 -block after G(xj) by Zf. i £ [n — 1]. In y, we denote the 0 73 -block after G (yj) by ZJ , j £ [m— 1]. 
Moreover, we denote the prefix 0 ” 74 by L Y and the suffix 0 ” 74 by R Y . 
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G(*i) 


Zf zl 


G(xa+i) 


yX yX 

^A+l ^A+m-l 


G(XA+m) 


z l +m zl_ i 


G(x n 


(V+(: *.+ 4 : ( tr- 1 ~)fpn ^] r ^-'[ir~) (TF)( I)'- ) • ■ ( ir- )(V ~) |f j f: • ^ [ o~" ~)fr ~) n 7 '-jo ¥ r '1 


/ / / 

/ / / // 


c 


QT174 


172 0 71 V\ P 71 l 72 0 73 • ■ • 0 73 l 72 P 71 Vm p 71 l 72 




0 n 74 


L y G( yi ) Zf Z Y _, G (y m ) R y 

Figure 3: Optimal traversal corresponding to structured alignment A = {(A +j,j) \ j G [m]} G S n ^ m . 

We now show the upper bound of ([3|, i.e., di,cs(x,y) < 2 n 74 + min^ e 5 n m <5(A). Consider a 
structured alignment A = {(A + l,l),...,(A + m, m)} G S nm . We construct an ordered partition 
of x as in Claim 4.4 by setting (see Figure [3| 


x(G(yj)) := G(x A +j) for j G [m], 
x{Zj) := Z\ +j for j G [m - 1], 
x(L Y ) :=G(x 1 )Zf...G(x A )Zl, 
x(R Y ) := Z\ +rn G{x A +m+i) ■ ■ ■ Z*_iG{x n ). 

Note that indeed these strings partition x and y, respectively. Thus, Claim |~i~T| yields 

m m— 1 

5lcs(+ y ) < <5lcs (x(L y ),L Y ) + Slcs{x(R y ), R y ) + y, <5 lcs(G(xa+j), G (yj)) + ^ ShCs( z A+ji Z J)- 

3= 1 3 = 1 

Since L Y = 0 n74 and x(L Y ) is a prefix of x of the correct form, by Claim|4~6|we have 6i,cs( x (L/ Y ), L Y ) = 


ny4 (note that 74 is chosen sufficiently large to make Claim 4.6 applicable). Similarly we obtain 
< 5 lcs (x(R y ), R y ) = 71.74. Since Zf = ZJ = 0 73 we have 5 lcs(^a+,j> Z J) = 0 - Finally, by matching 
the guarding ones and zeroes of G(xa+j) = 1 72 0 7 i £a+j 0 71 1 72 and G (yj) = l 72 0 71 yj0 71 l 72 we obtain 
5 LCs(G(xA+j),G(y J )) < 6 LC s(xA+j,yj)- Hence, we have 

$LCs(x,y) < 2n74 + ^2 ^LCS (xi,yj). 

(O')eA 

As A G 5 n ,m was arbitrary, we proved < 5 lcs(+ 2 /) < 27174 + min^ e 5 n m < 5 (A), as desired. 

It remains to prove the lower bound of (©, i.e., < 5 lcs(+ 2 /) > 27174 +min AeA n ,m S ( A )- As in 
let x(L Y ), x(G(yj)) for j G [m], x(ZJ) for j G [m — 1 ], x(R Y ) be an ordered partition of 


Claim 
x such 


4.4 

that 


m —1 


Slcs(x, y ) = Slcs(x(L y ), L Y ) + 5 LCS (x(R y ), R y ) + ^2 fiLCs(x(G( yj )), G( Vj )) + £ S LC s(x(Z Y ), Z Y ). 

3 = 1 i =1 

Clearly, we can bound 5lcs(x(ZJ), ZJ) > 0. Since L Y = 0 n74 and x(L Y ) is a prefix of x, by 


to construct an alignment A G A nm satisfying 


Claim 4.6 we have 5lcs(x(L y ), L Y ) > 7174 , and similarly we get 5lcs(x(R y ), R y ) > 7174 . It remains 

( 4 ) 


^(A)<^4cs^(G( % )),G(7/ i )), 

3 = 1 
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then together we have shown the desired inequality ^lcsOg y) + 27174 + min^ e _ 4 n <5(^4)- 

Let us construct such an alignment A. For any j £ [m], if x(G (yj)) contains more than half of 
some Xi> (which is part of G(a+)), then let i be the leftmost such index and align i and j. Note 
that the set A of all these aligned pairs (i,j) is a valid alignment in A n ,m, since no Xi or yj can be 
aligned more than once. 

Since by definition we have 8(A) = 4cs0*+ Vj) + (m — |A|) maxjj <5lcs4+ Uj) and since 

maxij 8i,cs(xi,yj) < max ? ;j(|xi| + \yj\) = 4 + 4 , in order to show Q it suffices to prove the 
following two claims. 

Claim, 4.7. For any aligned pair (i,j) E A we have 8 L cs(x(G(yj)),G(yj)) > 4cs(a+ 2 /j)- 


Proof. Recall that x(G (yj)) contains more than half of x'j. First consider the case that x(G(yj)) 
touches not only G(xt) but also G(a+) for some i' 7 ^ i. As between Xi and G(xy) there is at 
least one block of zeroes 0 73 and half of the guarding of G(xj) (i.e., 1 72 0 71 or 0 7l l 72 ), we obtain 
|x(G(yj))| > |0 73 | + |l 72 0 7l | = 73 + 72 + 7 i- Thus, any matching of x(G(yj)) and G (yj) leaves at 
least \x(G(yj))\-\G(yj)\ > (73 + 72 +7i) ~ ( 2 72 + 271 + 4) = 73 - 72 - 71-4 > 4 + 4 unmatched 
symbols, implying 8 LC s(x(G(yj)), G(yj)) > 4 + 4 > 8 LC s(xi,yj)- 

Now consider the remaining case, where x(G(yj)) touches no other G(av). In this case, x(G(yj)) 
is a substring of 0 73 G(xj)0 73 , i.e., we can write x(G(yj)) as 0 hL z0 hR , where z is a substring 
of G(xj). Since G (yj) starts with 72 ones, by Claim 4.5(2) we have 8i,cs(%(G(yj)),G(yj)) > 
min{ 72 , (Jlcs^O^, G(yj))}. Since 72 > 4+4 > 4cs(^i, Vj), it suffices to bound 5LCs(^0 hj *, G(y_,-)) 
from below. By a symmetric argument, we eliminate the block 0^ and only have to bound 
4cs ( z i G (yj)) from below. We can assume that \z\ > |G(yj)| — 72 , since otherwise <5lcs4, G (yj)) > 
72 > 4+4 > 4cs ( x i■, Vj)■ Thus, we have G (yj) = l 72 0 7 l ?/j0 7 l l 72 and can write z as l ri 0 7 l Xi0 7 l l ri? 

( 1 ) we have h L cs4, G(yj)) = 4cs(0 7 l .'Cj0 7 l l r + l 72 _ri 0 7 l yj 0 71 1 72 ). 


4.5 


with rr. rji > 0. By Claim 
By Claim 4.5 (2’), this yields < 5 lcs(+ G(ijj)) > min {71, 4 cs( 0 71 ^-'jO 71 0 71 ?/j0 71 + /2 )}, and since 

4 + 4 + 4cs ( x iiVj) it suffices to bound the latter term. By a symmetric argument we 


7i > 

eliminate the ones on the right side, and it suffices to bound 4 cs( 0 71 a: i 0 71 5 CWyjO 71 ) 

Hence 


Claim 4.5 (1) twice, this is equal to <Ilcs(2+ %)• 
f>LCs(x(G(yj)),G(yj)) > 8 LC s(xi,yj). 


Using 

we have shown the desired inequality 

□ 


Claim 4.8. If j is unaligned in A, then 8i,cs(x(G(yj)),G(yj)) > 4 + 4- 


Proof. Since x(G(yj)) contains less than half of any Xj, examining the structure of x we see that 
x(G(yj)) is a substring of P := XjO 71 l 72 0 73 l 72 0 71 Xj_|_i for some 1 < i < n, where at most half of 
Xi and Xj+i can be part of x(G(yj)). If x(G(yj)) contains ones to the left and to the right of 0 73 
in P, then x(G(yj)) contains at least 73 zeroes. Since G (yj) contains 271 + 4 — s Y < 271 + 4 
zeroes, at most 271+4 zeroes of x(G (yj)) can be matched, leaving at least 73 — 271 — 4 unmatched 
zeroes. Thus, 5j J cs(x(G(yj)),G(yj)) > 73 — 271 —4 + 4 + 4 - Otherwise, if x(G(yj)) contains 
only ones to the left of 0 73 in P (or only to the right), then x(G(yj)) contains at most 72 +4 ones. 
Thus, among the 272 + s Y > 272 ones of G (yj) at least 72 — 4 ones remain unmatched, implying 
4 cs 4 (G (yj)), G (yj)) > 72 - 4 > 4 + 4 - □ 


This finishes the proof of Lemma 4.3 


□ 


2 Actually x(G (yj)) could also be a substring of l T2 0 7l xi or of x n 0 71 l' y2 . We treat these border cases by setting 
£0 := £1 and x„+i := x n and letting from now on 0 < i < n. 
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5 Edit Distance 


We first show that the trivial cases of Edit(cdei-x, Cdel-y, c ma tch, c su bst) can be solved in constant time. 
For all other cases, on binary strings we present a reduction from Edit(cdei-x, Cdel-y, c ma t c h, c sub st) 
to Edit(Cg ubst ) and vice versa, see Section 5.1 Then in Section 5.2 we prove a conditional lower 
bound of 0{m 2 ~ £ ) for Edit(c su b s t) by applying our alignment-framework. Finally, in Section 5.3 we 


show that Edit(cdei-x) Cd e i- y , c ma tch) c au b s t) can be solved in time 0(n-\-m 2 ), which matches our lower 
bound. 


5.1 Easy Reductions 


All of our reductions are of the following form. Let E\ = Edit(cdei-x, Cdel-y, c ma tch) c su bst) and 
E 2 = Edit(c del _ x , c^gj , c^^, c^ ubst ) be two variants of the edit distance and denote the cost of 
any traversal T with respect to Ei by d Ei (T). We say that E\ and E 2 are equivalent, if there are 
constants a, (3 such that for any traversal T we have Se 1 (T) = a - 5 e 2 (T) + (3. Then the complexity 
of computing E\ and E 2 is asymptotically equal. 


Lemma 5.1. (1) Edit(c de i-x, Cdel-y, c matc h, c su bst) can be solved in constant time if c suhst = c matc h or 
Cdel-x T Cdei-y T min{cm a tch, c S ubst} • Otherwise, Fclit(cdei-x) Cdel-y: c ma tch: Csubst) on binary strings is 
equivalent to Edit(c( ubst ) on binary strings for some 0 < c(, ubst < 2. 

(2) Edit(cdei-X) Cdel-y, Cmatch, Csubst) is equivalent to Edit(c) lel _ x ,c , del _ y ,c^ atch ,c' ubs1 .) for some pos¬ 
itive integers c' del _ x , c^ el _ y , c' match , c' ubst . 


Note that by the first statement, hardness for general rational cost parameters follows by proving 
hardness of Edit(c(, ubst ) for 0 < c( ubst < 2. The second statement allows us to assume positive integer 
costs when giving an algorithm for Edit(cd e i- X , Cdel-y, c ma t c h, c su bst) in Section 


5.3 


Proof of Lemma \ 5. 1\ Let x,y be strings of length n,m. By symmetry, we may assume n > m. 
Observe that we can write the cost of any traversal T with respect to Edit(cd e i- X , c del-y, c ma t c h, c su bst) 
as 

AecI il (T) — A ■ C ma tch T B ■ C su bst T C • (cdel-x T Cdel-y) T (u Til ) • Cdel-x, 

for some A,B,C > 0 with A + B + C = m, since matchings and substitutions touch as many 
symbols in x as in y, so that we need exactly n — m more deletions in x than deletions in y. 

(1) If Cdel-x + Cdel-y < min{c ma tch, c su bst}, then we can replace any matching or substitution by 
a deletion in x and a deletion in y without increasing the cost. Thus, an optimal traversal has 
C = m and minimal cost n ■ Cdel-x + cn • Cdel-y, which can be computed in constant time. Similarly, 
if c ma tch = c su bst, then the minimal cost is independent of the symbols in x and y. We may 
arbitrarily set A + B and C subject to A + B + C = m and A + B, C > 0, and the minimal cost is 
m ■ min (cmatch; Cdel-x + Cdel-y} + (n — m) Cdel-x, which can be computed in constant time. 

Now assume that c match / c subst and c de i-x+Cdel- y > min{c ma tch, c su bst}- Restricting our attention 
to binary strings, by flipping all symbols in y but not in x we can swap the costs of matching and 
substitution. Thus, we may assume that c sub st > c ma tch (and Cd e i- X + Cdel-y > c ma tch)- We set 


c su bst •— ct( Csubst c ma tch) where 


a := 


^-del-x^^del-y ^match 


One can easily verify that for any traversal T with cost (fEdit(^) = A ■ c ma t c h + B ■ c su bst + C ■ (cdel-x + 
Cdel-y) + {n-m) ■ Cdel-x (with respect to Edit(c de i- X , c de i- y , c matc h, c su bst)) we have 


ad E dit(T) - am ■ c ma tch + (n - m)( 1 - ac de i- x ) = B ■ c' ubst + C ■ 2 + (n - m). 
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As the latter is the cost of T with respect to Edit(c / subst ), this proves that Edit(c' ubst ) is equivalent 
to Edit(cdei-x,Cdei-y,c m atch,c su bst)- Finally, note that c' ubst > 0. If c' ubst > 2, then we can replace it 
by 2 without changing the cost of the optimal traversal, since we can replace any substitution (of 
cost 2) by a deletion and an insertion (both of cost 1). This yields 0 < c(, ubst < 2. 

(2) Since we always assume all operation costs to be rationals, without loss of generality 
Cdel-x; Qel-y) c ma tch) c su bst have a common denominator D. We obtain positive integral operation costs 
by Setting . — -Dcdel-xTAf, Cd e l-y ■— -^Cdel-yTAf, Enatch '— -^dCmatch T2A7. C g||bs) .— -Dc su bst + 2M 

for a sufficiently large integer M. Both variants are equivalent, since <5Edit(^) is changed to 

D5Edit{T) + m ■ 2 M + (n — m) ■ M. □ 


5.2 Hardness Proof 

In this section we study the edit distance with matching cost 0, deletion and insertion cost 1, and 
substitution cost 0 < c su b s t < 2. We abbreviate c^Edit = ^Edit(c aubst )- 

Lemma 5.2. Edit(c su b s t) admits coordinate values by setting 

l x := 11100, 0 X := 10011, 1 Y := 00111, 0 Y := 11001. 

Proof. All four values have the same length and the same number of ones, so they have equal 
type. Using Fact |5.5| (1) (to be proven below), we have <5Edit(0 x , 0 Y ) = ^Edit(10011,11001) = 
^Edit (0011,1001) = <5Edit(001, 100). Depending on c su b s t> the optimal traversal of (001,100) is ei¬ 
ther to delete both ones or to substitute the first and last symbols. This yields dEdit(001,100) = 
min{2,2c subs t}- Similarly, we obtain (5 E dit(Lx, 0 Y ) = fedit(O x ,l Y ) = (5>Edit(0 x , 0 Y ) = min{2, 2c subst } 
and ^Edit(l.x, 1y) = fedit(11100, 00111) = min{4,4c subs t}- Hence, 6 E dit(lx, 1 Y ) > <^Edit(lx,0 Y ) = 

^Edit(Ox) 1 Y ) = ^Edit(Ox) Oy)- 

□ 

Definition 5.3. Consider instances x\,...,x n G It* and yi,.. . ,y m £ Pt Y with n > m and types 
tx = (tx,Sx),t Y = (£y,s y )- We define the parameters p := 2[1/c aub stl, 71 : = 10 p{lx + ^v), 72 := 
6/971 + 5s x — £ x , and 73 := 272 (since c su b s t is constant, these parameters are 0(£ x + ^y))- 

To guard a string by blocks of zeroes and ones, we set G(z) := (l 7l 0 7l ) p z(0 71 l 7l ) p . Now the 
alignment gagdet is 

x := G(xi) 0 72 G(x 2 ) 0 72 ... G(s n _i) 0 72 G(x n ), 
y := 0 n73 G(yi) 0 72 G (y 2 ) 0 72 ... G(y m _i) 0 72 G (y m ) 0 n ^ 3 . 

Let us provide some intuition on the complex guarding G(z), which contains more parts com¬ 
pared to the construction for LCS. Consider a block B = (l 7 0 7 ) p . Clearly, B can be completely 
matched to B, resulting in a cost of 0. Consider a slight perturbation B' of B by prepending A ones 
and deleting the last A zeroes. Then the edit distance of B and B' is at most 2A, since we may 
delete the prepended ones in B l and the additional zeroes at the end of B. Another upper bound 
for the edit distance of B and B' is 2 p- Ac su b s t; since we may match the first 7 ones, then substitute 
the next A symbols, then match the next 7 — A zeroes, and so on. By choosing p := 2|T/c su b s tl) 
the traversal using substitutions is more expensive, and indeed we prove that then the edit distance 
is at least 2A. This provides a building block where we got rid of substitutions and where slight 
perturbations are severely punished. Thus, our guarding G(z) = (l 7l 0 7l ) p z(0 7l l 7l ) p ensures that 
an optimal traversal of G(x) and G (y) aligns x and y, and this also holds after small perturbations. 
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Lemma 5.4. For any 0 < c su b s t < 2, Definition 5.3 realizes an alignment gadget for Edit(c su b s t)- 


Thus, Theorem 


3.3 


this with Lemma 5.1 proves Theorem 1.2 


is applicable, implying a lower bound of 0{m 2 £ ) for Edit(c su bst)- Combining 

We remark that our construction is no unbalanced 


alignment gadget, as the length of y grows linearly in n, not necessarily in m. Thus, we do not 
obtain a conditional lower bound of ^((nm) 1 ^) (i.e., not for m ~ n a for all 0 < a < 1 ), which in 


fact is ruled out by the algorithmic result of Theorem |1.4[ see Section 5.3 


In the proof of Lemma |5.4| we make use of the following basic observations. 


Fact 5.5. Let x , y , z be binary strings and £, k € No- Then we have (1) ^Edit(l fc a^, 1 k y) = ^Edit(^, y), 
(2) ^Edit(x, y) > | |x| — \y\ | and (3) I^EditO*^ y) ~ fedit^j y) | < \z\. We obtain symmetric statements 
by replacing all l’s by 0’s and by reversing all involved strings. 


Proof. We show (1) for k = 1, then the general statement follows by induction. Consider an optimal 
traversal T of lx, 1 y. If both ’l’s are deleted in T, then we can instead match them and improve T, 
contradicting optimality. If exactly one ’1’ is matched or substituted, then the other ’1’ is deleted, 
so we may instead match the two ’l’s without increasing cost. Thus, without loss of generality an 
optimal traversal of (lx, 1 y) matches the two ’l’s. 

For (2), note that matchings and substitutions touch as many symbols in x as in y. Hence, there 
have to be at least |x| — \y\ deletions in x and at least \y\ — |x| deletions in y. 

For (3), taking an optimal traversal of (x, y) and appending \z\ deletions of the symbols in z 
shows that ^Edit^A y) ^Edit^j y) + \z\. For the other direction, consider an optimal traversal T of 
{xz, y). Replace any matching or substitution of a symbol in z with a symbol y[j] in y by a deletion 
of y[j]. Also remove every deletion of a symbol in z. This results in a traversal T' of ( x,y ) with 
cost at most (lEdit^Z; y) + \z\, as we introduced at most \z\ deletions in y. This proves the desired 
inequality h E dit(ah v) < <$Edit (xz, y) + \z\. □ 


Fact 5.6. Let£, m, r > 0. Then for any x G {0 £ l m 0 r , l m £ r , l m ^0 r ,0^1 m r } we have SEdit(x,l m ) > 
\£-r\ + c su bst • min{C r}. 


Proof. Fact 


5.5 


( 2 ) yields <W 0 W m (C, l m ), 5 Edh (l m - £ ~ r , l m ) > £ + r > \i - r\ + c sub st ■ min{^ r}, 
since c su bst < 2. For x = 0 £ l m-r ', consider any optimal traversal T. If T substitutes s zeroes and 
deletes the remaining i — s zeroes, then <5Edit(0 £ l m-r , l m ) = c au bst • s + (£ — s) + 5Edit(l m_r j l m_s ). 

(1), $Edit (l m—r ; l m_s ) = h Ed it(e, ll r_s l) = [r — s|, where e is the empty string. Hence, 
", l m ) = mino< s <^{c su bst ■ s + £ — s + |r — s|}. A short case analysis shows that this 


By Fact 

^Edit(0 £ l 


5.5 


term is minimized for s = min{£, r}, where it evaluates to c su b s t ■ min{£, r} + £ + r — 2 mh\{£, r} = 
Csubst • min{£,r} + \£ — r\. The case x = l TO- m r is symmetric. □ 


For a string y and indices a < b we denote the substring from y\a\ to y[b\ by y\a..b\. 


Fact 5.7. Let x and yi,..., y* be binary strings. Set y = y\ ... y n . Then we have 


k 

fedit (x,y)= min V'^Edit (x{Vj),yj), 

x(y 1 ),...,x(y k ) “ 


where x(yi), ... ,x(yk) ranges over all ordered partitions of x into k substrings, i.e., x(y \) = x[io + 
l..zi],x(y 2 ) = x[h + I..Z 2 ], • • • ,x(y k ) = x[i k -1 + 1.-4] for any 0 = i 0 < i\ < ... < i k = \x\. 


18 












Proof. For any ordered partition, the substrings x(yj) are disjoint and ordered along x, so we can 
concatenate (optimal) traversals of (x(yj),yj), j £ [ k ], to form a traversal of (x,y). This shows 

£Edit(z,y) < Ej=i 5 Edit(x (Vj),yj)- 

Now let T be an optimal traversal of (x, y). Let Jj be the indices in x that appear in a matching 
or substitution operation with symbols in yj . Note that these sets are ordered, in the sense that 
for any i £ Jj and i! £ Jji with j < f we have i < i' . This allows to find an ordered partition 
x(yi),.. ., x(yk) of x such that x(yj) contains the indices Jj for any j. Let us denote the total 
cost of the substitutions involving yj by Sj. Since traversal T deletes | yj\ — \Jj\ symbols in yj and 
\x(yj)\ — \ Jj\ symbols in x(yj ), we have 5(T) = Y^j=i \Uj\ + \ x (Vj)\ ~ 2| Jj\ + Sj. Clearly, we can 
construct a traversal of (x(yj),yj) that follows the matchings and substitutions in Jj and deletes all 
other symbols, showing fedit (x(yj),yj) < \yj\ + \x(yj)\ — 2\Jj\ + Sj. By optimality of T, we obtain 
$Edit(x, V ) > Ei=i fedit {x(yj), yj)- □ 


Observe that indeed x only 
and type(x) only depends on n,m,t x , and t Y . and similarly for 


Proof of Lemma \5.4\ From now on let x,y be as in Definition |5.3 
depends on m, t Y , and x\,... ,x v 
y. Moreover, x and y can clearly be constructed in time 0((n + m)(£ x + ^y))> where £ x = |*i| = 
• • • = \x n \ and i Y = 1 2/11 = ... = \y m \. 

It remains to prove that for some C, we have 


min 5(A) < 6(x,y) — C < min 5(M). (5) 

A£*A n ,m A.(zS n,m 

We will set 

C := 2 n 73 - /3(n - m )(74 + 72 ), 

where 

P := 1 - c su bst/5 and 74 := 4/ryi + £ x . 

Note that 74 is the length of G(xj). 

Let us give names to the substrings consisting only of zeroes in x and y. In x, we denote the 
0 72 -block after G (xf) by Zf , i £ [n — 1]. In y. we denote the 0 72 -block after G (yj) by ZJ. j £ [m— 1]. 
Moreover, we denote the prefix 0 n73 by L v and the suffix 0 n73 by R Y . 

We first prove the crucial property that for any prefix x' of x the distance ^EditC^i L Y ) is 
essentially |L Y | — P\x'\ = 7173 — P\x'\. This is due to a careful choice of the parameters 7 i, 72 ,/ 0 - 

Claim 5.8. For any prefix x' of x we have SEdit(x', L Y ) > 7173 — f3\x'\, with equality if x' is of the 
form G(xi)0 72 ... G(x/)0 72 for any 0 < i < n. Symmetric statements hold for 5-pAit( x ". R Y ) where 
x" is any suffix of x. 

Proof. The parameter 73 is chosen such that \x'\ < \x\ < |L Y |: Indeed, |x| < 71 ( 4/771 + £ x + 72 ) < 
n ■ 272 < 7173 = |L Y |. Observe that all zeroes of x' can be matched to zeroes of L Y , while all ones 
of x' have to be substituted. The remaining zeroes of L Y have to be deleted. Denoting the number 
of ones in x' by £, we obtain dEdit(-'c / , L Y ) = £ ■ c su b s t + (|T Y | — |a? 7 1). We will show £ > (x'l/5, with 
equality if x' has the special form as in the statement. In other words, the relative number of ones 
£/\x'\ is at least 1/5, with equality if x' has the special form. This implies ^Edit^/ T Y ) > 7173 — /3\x'\, 
with equality if x' has the special form. 

Note that each Xi has length £ x and contains s x ones, so that G (xf)Zf = (l 7 l 0 7 l ) p Xj(0 7 l l 7 l ) p 0 72 
contains 2/771 + (£ x — s x ) + 72 zeroes and 2/371 + s x ones. The parameter 72 is chosen so that the 
number of zeroes is four times the number of ones, implying that the relative number of ones is 
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G(*i) 

1 (0.., z i ... 0) "' (0... Z 1 ... 0) 1 


| (0 . . . ^A+l ... 0] " ‘ (0 . . Z A+m-X . 0) | 


I (0 . . Z A+m. . . 0] ‘ ‘ ‘ (0 . . . Z n- 1 ... 0) | 

G(x„) 



Figure 4: Optimal traversal corresponding to structured alignment A = {(A +j,j) \ j G [m]} G S n m . 


1/5. Note that any prefix of (l'>' 1 0 7l ) p has relative number of ones at least 1 / 2 . Since x,0 71 has less 
than 271 zeroes and |(l 7 l 0 7l ) p | > 271 , any prefix of (l 71 0 71 ) p x l iT / ' has relative number of ones at 
least 1/4. Since any prefix of l ^ 1 ( 0 71 l 7l ) p_1 has relative number of ones at least 1 / 2 , any prefix of 
(l 7 i0 7 i) p Xj(0 71 l 7l ) p has relative number of ones at least 1/4. The relative number of ones decreases 
by adding any prefix of 0 72 , however, for the final string (l 7 l 0 7 l ) p Xi( 0 7 l l 7 l ) p 0 72 , we already argued 
that the relative number of ones is 1/5. This shows that the relative number of ones of any prefix 
of x is at least 1/5. □ 

We now show the upper bound of @, i.e., h E dit (x,y) < C + minAeS^ ^EditO^i, Vj)- 

Consider a structured alignment A = {(A +1,1),..., (A + m, no)} G S n _ m . We construct an ordered 
partition of x as in Fact 5.7 by setting (see Figure [4| 


®(G (%)) 

x(Zj) 

x(L Y ) 

x(R Y ) 


for j G [m], 


= ZLi for j G [rn - 1] 


= G(x a+j ) 

X 

A +j 

= G(x 1 )Zf...G(x A )ZZ, 

= ,Z^_|_ m G(xA+m+l) • • • Z n _ iG(x n ). 


Note that indeed these strings partition x and y, respectively. Thus, Fact 5.7 yields 

m 771—1 

5 E dit(z, y) < fedit(a:(T^ ), L Y ) + fedit (x(R Y ), R Y ) + ^ 5Edit(G(.TA+j), G(lJj)) + ^ ^Edit(^A +31 Z; 


3 = 1 


3 =1 


Since x(L Y ) is a prefix of x of the correct form, by Claim 5.8 we have h E dit(a:(T Y ), L Y ) = 71,73 — 
/3\x(L Y )\. Symmetrically, we obtain 5 E dit(^(7? Y ), R Y ) = 77,73 — (3\x(R Y )\. Note that \G(xi)Zf\ = 
74 + 72 , so that |x(L Y )| + |x(i? Y )| = (77 — 777 ,) (74 + 72 )- Moreover, as Zf = ZJ = 0 72 we 
have <5 E dit(^A+j’ Zj) = 0 . Finally, by matching all guarding zeroes and ones of G(xa+j) = 
(l 7 l 0 7 l ) p XA+j(0 7 l l 7l ) p and G(t jj) = (1 7 i 0 7 i ) p t/j(0 7 i 1 7i ) p we conclude 5 E dit(G(xA+.j), G(y,-)) < 
fedit(^A +j,Vj)- This yields 

m 

<^Edit ( x j y) < 27773 - P(n - m) (74 + 72 ) + ^2 5 E dit(x A+ j,yj) = C + ^ <5 Edit (xj, yj). 

i =1 

As A G 5 n ,m w as arbitrary, the desired inequality follows. 

It remains to prove the lower bound of (|5]), i.e., 5 E dit(a>, y) > C' + min J 4e.4„,m 5(A). As in Fact 5.7 


let x(L Y ), x(G(yj)) for j G [ 777 ], x{ZJ) for j G [777 — 1], x(R Y ) be an ordered partition of x such that 

m m— 1 

5 Ed it (x , y) = 6 Ed x(x(L Y ),L Y ) + 8 Ed x(x(R Y ),R Y ) + ^5 Edit (x(G( yj )), G( Vj ))+ ^ 8 Edit (x(Zj), ZJ) 


3 = 1 


3 = 1 
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Figure 5: Illustration for the proof of Claim 


5.9 


111...Ill 

Z2p 


We define an alignment A as follows. If there is some i such that Xi is contained in x(G (yj)), 
then align j with any such i. Otherwise leave j unaligned. 

Claim 5.9. We have 


^Bdit(®(G(%-)), G (yj)) > /?(74 - k(G(j/j))l) + l dEdlt ( Xuy ^ 

[maxjjv d E dit {Xi,yj') 


if j is aligned to i, 
if j is unaligned. 


Proof. If |®(G(yj))| > 72, then \x(G(yj))\ > 72 > 74 + 2(4 + 4) > 74 + 2 maxjj/ 5 E dit(xi,yj') and 
by (3 > 1/2 the right hand side of the claim is at most 0, so the claim holds trivially. Otherwise 
x(G(yj)) is shorter than any Zf = 0 72 , implying that x(G(yj)) is a substring of 0 72 G(xi)0 72 for 
some i 6 [n]. 

We write G (yj) as Z- 2 p Z- 2p+ i ... z 2p -i z 2p , where z_ 2k = z 2k = l 71 , z_ 2k+1 = z 2k -i = 0 71 , 
and zo = yj (for all 1 < k < p). As in Fact 5.7 we split up x(G(yf)) into x(z k ), —2 p < 
k < 2p, such that 6 E dit(x(G(yj)), G(yj)) = Y?k=- 2 P S E dit(x(z k ), z k ). Similarly, we write G(xj) as 
W- 2p W- 2 P +i ... w 2p -i w 2p . We denote the distance of the start of x(z k ) to the start of w k by A^(k), 
i.e., if x(z k ) = x[a..b\ and w k = x[c..d\ we set A i,(k) : = |a — c|. Similarly, we set Ap(k) := \b — d\. 
For an illustration, see Figure[5] Note that A p(k) = A E (k + 1) holds for any k. 

First assume (*): for some k 7 ^ 0 the string x(z k ) is longer than |yi or x(z k ) has less than 

i7i- B y 


ill common symbols with z k . Then clearly <5 Ed it(a:(G(y i ,-)), G(yj)) > 5 Ed it(x(z k ), z k ) > 1 ‘ 


3 

4 

Fact [5A| ( 2 ), we also have S Edit (x(G(yj)), G (yj)) > |G(y0l - |aj(G(y,-))| = 74 - \x(G(yj))\. As a linear 
combination of these two lower bounds, we obtain 5 E dit(x(G(yj)), G (yj)) > ft(l 4 — |x(G(r/j))|) + (1 — 
P)\li- Since (1 — ft)\ii = Cs 2 o st 7i > 4 + 4 > maxjj/ b Ed \t(xi, Uj'), we have proven the statement 
in this case. 

If (*) does not hold, then we have Al(&), A R(k) < ^71 for any |fc| > 1: It suffices to show the 
claim for any even k 0, since A u(k) = A i(k + 1). For any even k 0, the string x(z k ) has to 
contain the majority of a block with even £ ^ 0. Since the numbers of blocks are identical in 
G (yj) and G( 07 ), x(z k ) has to contain the majority of w k for any even k / 0. Specihcally, x(z k ) 
contains at least |yi symbols of w k and has length at most J 71 , implying the desired inequalities 
for A L(k), A ji(k). Note that in this case i and j are aligned. 
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Note that for even k ^ 0 we obtain x(z k ) from w k = z k = l 71 by either deleting a prefix of A R (k) 
ones or prepending A R (k) zeroes, and by either deleting a suffix of A R (k) ones or by appending 
A R (k) zeroes. Hence, Fact 5.6 shows that 

feit (x(z k ), z k ) > | A L (k) - A R (k)\ + c subs t • min{A L (fe), A R (k)}. ( 6 ) 

The same argument works for any k with |fc| > 1. For k E { — 1,1} the argument does not work, 
since Z-\ = z\ = 0 71 is not sorrounded by blocks of l 71 . However, for k E { — 1,1} we have the 


(7) 


weaker fedit (x(z k ),z k ) > |A R {k) — A#(A;)| by Fact |5.5| (2). Moreover, by Fact 5.5 (3) we have 

<5EditO(2o),£o) > fedit (xi,yj) - A l (0) - A r (0). 


Combining these inequalities yields 5 E dit(x(G(yj)),G(yj)) > <5 E dit (xi, Dj) + A L (~2p) + A R (2p) as 
we show in the following. This implies the desired statement, since A R (—2p) + A R (2p) > ||G(t/j)| — 
|x(G(yj))|| > 74 — |a:(G(yj))| > (3 (74 — |x(G(yj))|). To show the claim, we set sl := min{Ai(fc) | 
— 2p < k < 0} and s R := min{A^(fc) | 0 < k < 2p}. Note that A R (k) has a total variation of at least 
A r {—2p) — sl + Al(0) — sl over k = — 2 p ,..., 0 , since it starts in Al(— 2p), changes to sl, and then 
changes to A R (0). Thus, summing | A R (k) — A R (k)l = | A R (k) — A R (k + 1)| over all — 2 p < k < —1 
yields at least A^(—2 p) — Si + Ai(0) — sl- Moreover, for every —2 p < k < — 1 inequality ([ 6 ]) applies 
and the summand Cs U b s t • minjA^A:), A R (k)} is at least c su b s t ■ sl- As the number of such k’s is 
2p — 1 > 2/c su bst) the total contribution of the summand c su b s t ■ min{A/,(fc), A^(fc)} over all k < 0 
is at least 2s l- Thus, we have 

-1 -1 -2 

y. fedit (x(z k ),z k )> \A L (k) - A R (k)\ + c suhst -mm{A L (k),A R (k)} 

k=— 2 p k=— 2 p k=— 2 p 

> (A l (-2 p) -s L + A l ( 0) - s L ) + (2 s L ) > A l (- 2p) + A L (0). 

Using a symmetric statement for the sum over all k > 0 as well as equation {[tJ) , we obtain the desired 
inequality 5 E dit(x(G(yj)),G(yj)) = YX~- 2 p ^EditOrOfc), z k ) > ^EditOu, Vj) + A L (-2p) + A R (2p). □ 

Since L Y = 0 n73 and x(L Y ) is a prefix of x. by Claim | 5 . 8 | we have 5 Edit(x(L Y ), L Y ) > 77.73 — 
f 3 \x(L Y )\, and symmetrically we get fedit (x(R Y ), R Y ) > 7773 — f3\x(L Y )\. By Fact 5.5 ( 2 ), we have 
(fedit (x(ZJ), ZJ) > \\ZJ\ — |x(ZJ)|| > f}{ 72 — \x(Z Y )\). Putting all of this together, we obtain 

m m —1 

^Edit (x,y) > 27773 + /3[y^(74 - \x(G(yj))\) + ^ (72 - \x(Z Y )\) - \x(L Y )\ - |.x(i? Y )| + 5(A), 


3 =1 


3 =1 


where we used 5(A) = Yh{i,j)eA <^Edit(^i, Vj) + (m - \A\) maxjj <!>Edit(ah, Vj)- Note that by definition 
of x and since the strings x(G(r/j)), x(ZJ), x(L Y ), x(R Y ) partition x we have 

m m—1 

7774 + (77 - 1)72 = M = yy \x(G(yj))\ + £ \x(zj)\ + \x(L Y )\ + |x(i? Y )|. 

3 =1 3 =1 

Together, this yields the desired bound from below 

5Edit(x,y) > 27773 - /3(n - m)( 74 + 72 ) + <5 (A). □ 
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5.3 Algorithm 

For completeness, we prove a generalization of the algorithm of Hirschberg [12j from LCS to edit 
distance. Recall that the trivial dynamic programming algorithm computes a table storing all 
distances ^EditO^l-i], z/[l..j]). hi contrast, we build a dynamic programming table storing for any 
index j and any cost k the minimal index z with ^Edit^ [!••*], 2 /[l--j]) — Cd e i- X (* — j) = k. For 
some intuition, note that for i > j at least i — j symbols in x[l..z] have to be deleted so that the 
cost hEdit(*[l-h], y[l..j]) is at least Cd e i- X (^ — j)- Thus, it makes sense to “normalize” the cost by 
subtracting Cd e i- X (* — j). As we will see, the normalized cost is bounded by 0(m) (the length of the 
smaller of the two strings), which reduces the table size to 0{m 2 ). 


Theorem 5.10. Let Cdei- X ; Cdei-y; Cmatch> c su bst positive integers. Edit(cdei- X , Cdei-y> Unatchj Csubst) 
can be solved in time 0((n + m 2 ) log |X|) on strings of length n, m with n > m over alphabet X. 


Note that it is easy to ensure X C [n + m] after Q{n log(min{ | X|, n)) ) pre processing^ Thus, the 


1.4 


follows from the above 


running time is at most 0((n + m 2 ) logn) = 0(n + m 2 ), and Theorem 
theorem and the second part of Lemma |5.1| Our algorithm is designed for the pointer machine 
model; on the Word RAM the log-factor can be improved. 

Consider strings x, y over alphabet X of length n, m, respectively, n > m. For convenience, we 
set min0 := oo. For any index i £ {0,..., n} and symbol a £ X we set 


NextL CT (i) := rnin-jY | i < i' <n and x\i'\ = cr}, 

Next^ cr (i) := min-jY \ i < i' <n and x[i'\ / cr}. 

We argue that a data structure can be built in 0(n log |X|) preprocessing time supporting NextL (r (i) 
and Next^ CT (i) queries in time 0(log |X|). A simple solution with worse running time is to precom¬ 
pute all answers to all possible queries NextX (7 (z) and Next^ cr (z), with i £ {0, ...,n}, cr £ X, in 
time 0(|X|n) by one scan from x[n\ to x[\}. To improve the preprocessing time for Next^^z), note 
that Next^ ;(7 (z) = z + 1 for all cr / x[i + 1]. Thus, we only have to precompute Next^^j(z) (which 
can be done in time 0(n) by one scan from x[n] to x[l]), then Nextp ;(T (z) can be queried in time 
0(1). For NextT CT (z), for any z £ {0,... ,n} we build a dictionary Dj storing NextL 0 .(z) for each 
cr £ X. Note that i and Di differ only for the symbol x[i + 1]- Thus, we can use persistent 
search trees m as dictionary data structures, resulting in a preprocessing time of 0(?zlog|X|) for 
building Dq, ..., D n and a lookup time of 0(log|X|) for querying NextL 0 .(z). Using such a Next 
data structure, we can formulate our dynamic programming algorithm, see Algorithm [l] 

Since NextL CT and Next^ can be queried in time 0(log |X|) and M = (cdei- x + Cdel-y)"z = 0{rri), 
Algorithm [T] runs in time 0{m 2 log |X|). Together with the preprocessing time for the Next data 
structure, we obtain a total running time of 0((n + m 2 ) log |X|). It remains to argue correctness. 


Correctness We prove that the dynamic programming table I[j. k} has the following meaning. 
Lemma 5.11. Algorithm [i] computes for any j £ [m], k £ Z 

I[j,k\ = min {0 < z < n \ 6 E dit(x[l..i],y[l..j}) - c de r x (z - j) = k}. 

3 To compress the alphabet we build a balanced binary search tree T whose nodes correspond to E (by simply 
adding all symbols of x and y to T). Then we replace each symbol by its index in some fixed ordering of the nodes 
of T. 
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Algorithm 1 Algorithm for solving Edit(c de i-x, Cdei-y, c ma tch, c su bst) in time 0((n + m 2 ) log |£|). 

Assumption: c de i-x, Cdei-y, c ma tch, c su b s t are positive integers 
Input: strings x, y of length n, m, n>m 
Output: S Edit (x,y) 

M i (Qlel-x “ 1 “ Qlel-y)^ 

Implicitly set I[j, k] •(— oo for all j and all k < 0 or k > M 
J[ 0 , 0 ] <- 0 

/[0, k] <r- oo for 0 < k < M 
for j = 1 ,..., m do 
for k = 0, ..., M do 

I[j, k] min {I[j -1 ,k- c de i-x - c de i- y ], 

Next =J/ yj (I[j 1 , k c ma tch])) 

Next ~ 1 , k - c subs t])} 

end for 
end for 

return Cd e i-x(n — m) + min{0 < k < M \ I[m, k] < oo}. 


Proof. Let R.[j. k ] := min{0 < i < n | fedit^[1..*], y[l..j}) — Cd e i- X (* — j) = k} be the right hand side 
of the statement. 

The statement is true for j = 0, since for the empty string e we have fedit^ [!••*], e) = Cdei-x ■ i. 
so that R[ 0, A:] = 0 for k = 0, and oo otherwise, which is exactly how we initialize /[0, k\. 

We show that R[j, k] = oo for k < 0 or k > M, which is also implicitly assumed for I[j, k] in 
Algorithm [T] Note that for i> j we have to delete at least i — j symbols in x[l..i] when traversing 
it with y[l..j\, which implies ^Edit(^[l--*])y[l..j]) > Cdei- X (* — j). Since additionally for i < j the 
term -c de i_ x (z - j) is positive, we have <5 E dit(®[!■•*], ~ Cdel-x(* - j) > 0 for all i,j. Thus, for 

no k < 0 we can have <5Edit(®[!•■*]> 2 /[l■ -j]) — Cdei-x(* — j ) = k, implying R[j,k] = oo in this case. 
Moreover, <5 E dit (*[!■■*], 2 /[l-j]) < c de i- x • * + Cdei-y -j, which implies <5 E dit(® [!■•*], 2 /[l--j])-c de i-x(*- j) < 
(cdei-y + Cdei-x)j < M. Thus, we also have R[j. k] = oo for k > M. 

It remains to show the statement for j > 1 and 0 < k < M. Inductively, we can assume that 
the statement holds for j — 1. We show that R[j, k] satisfies the same recursive equation as I[j, k ] 
in Algorithm [I] Let i := R[j,k\ and consider an optimal traversal T of (x[l..i],y[l..j]). We obtain 
a traversal T' by removing the last operation in T. 

If the last operation in T is a deletion in x, then T' is an optimal traversal of — 1], y[l..j]) 

with cost ^EditOdl-* ~ l],y[l..j]) = ^Edit(rc[l--i], y[l..j]) — Cd e i- X - Thus, we can decrease i to i — 1 
while keeping k = d E dit(x[l..i],y[l..j}) - c de i-x(* - j ) = <fedit (x[l..i - l],y[l..j]) - c de \. x (i - 1 - j). 
This contradicts minimality of i = R[j , k], so the last operation in T cannot be a deletion in x. 

If the last operation in T is a deletion in y. then T' is an optimal traversal of (x[l..i\,y[l..j — 1]) 
with cost 5 E dit(x[l-.i\,y[l..j - 1 ]) = < 5 E dit(®[l--*]>y[l--j]) - c de i-y Thus, we have 

R[j,k\ = min{0 < i < n \ 6 E dit{x[l..i],y[l..j]) - c de i- x {i ~ j ) = k} 

= min {0 < i < n \ 5 Edit(x[l..i],y[l..j - 1 ]) - c de i- x (* - (j - 1 )) = k- c de i- y - c de i- x } 

— R\j 1 , k Cdel-x Cdel-y]- 

If the last operation in T is a matching of .x[z] and y[j] . then T' is an optimal traversal of 
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(x[l..i - l\,y[l..j - 1]) with cost 5 E dit(z[l--* - l],y[W - 1]) = $Edit(x[l..i],y[l..j}) - c match . This 
implies £ Edi t(x[l.i - 1 ], 2 /[l- J - 1 ]) - c de i_ x ((i - 1 ) - (j - 1 )) = k - c matc h, so that i - 1 is a 
candidate for R[j — l,k — c ma t c h]- Let i! := R[j — l,k — c matc h] and note that %' < i — 1. As 
x[i] = y[j], we obtain i > Next^^ (i 1 ) =: i*. In the following we show i = i*. By definition of i! 
we have 8-Em{x[l..i'],y[\..j - 1]) - c de i-x(* / - j + 1) = k - c matc h- Hence, 5 E dit(x[l..i*},y[l..j]) < 
Cmatch + c de i-x(** -*'-!) + ^Edit (x [1. .i '], y [1 ..j - 1]) = k + c del _ x (i* - j). We even have equality, since 
otherwise fedit(»[!••*],y[L.j]) < £Edit(z[l-T],y[L.^) + Cd e i-x(*-**) < k + c del . x (i-j), contradicting 
the definition of i. Thus, i* is a candidate for R[j, k]. implying that we also have i < i*. Hence, we 
have R[j, k\ = i* = Next = y jj]{i') = Next*^ (R[j - 1 ,k- c matc h])- 

We argue analogously if the last operation in T is a substitution of x[i] and y[j]. This yields 

R[j, k] = min{i?[j-l, fc-c de i. x -c de i.y],Nextf ;2/W ( J R[j-l, A:-c match ]),Next^ 2/ ^(i?[j-l, k-c suhst ])}. 
Hence, R[j, k] satisfies the same recursion as I\j. k ], and we proved R,[j. k] = I[j, k] for all j, k. □ 
Lemma 5.12. Algorithm^ correctly computes 5 E dit 2 /)- 

Proof. Among all optimal traversals of (x,y), pick a traversal T that ends with the maximal num¬ 
ber d of deletions in x. and set i := n — d. Observe that i is minimal with <5 Ed it(ar [1 ..z], y[l..m\) + 
c d el-x(n - *) = 5 E dit(z, y), which is equivalent to fedit(»[!■•*], 2 /[l-.m]) - c de i- x (i - m) = 5 E dit(z, y) ~ 
c dei-x( n — m ) =: k. Thus, i = I[m, k ] < oo, which implies that the return value of Algorithm [l] is at 
most c de i_x(n — m) + k = 5 Edit (x, y). 

Moreover, for any k with I[m, k] < oo there is a 0 < i < n with <5 E dit(a;[l--*], y[l..m]) — c de i_ x (i — 
m) = k. By appending n — i deletions in x to any optimal traversal of (x[l..z], y[l..m\), we obtain 
5 E dit(-'c, y) < <5 E dit(®[!■•*]) y[l-.m]) + c de i_ x (n — i) = k + c de i_ x (n — m). Hence, the return value of 
Algorithm [I] is also at least <5 E dit(ahy)- HI 


6 Dynamic Time Warping 

We present coordinate values and an unbalanced alignment gadget for DTW on one-dimensional 
curves taking values in No, i.e., we consider the set of inputs X := Uk>o^o- 

Lemma 6.1. DTW admits coordinate values by setting 

l x := 1100, 0 X := 0110, 1 Y := 0011, 0 Y := 1010. 

Proof. All four values have the same length and sum of all entries, so they have equal type. Short 
calculations show that 4 = $dtw(1x 5 ly) > ^dtw(Ox) ly) = $dtw(0x 5 0 y ) = 5 dtw(1x. 0 y ) = 1. □ 

Definition 6.2. Consider instances x\,... ,x n E Xt x and yi,... ,y m £ Xj Y with n > m and types 
t x = (Lx, Sx),t Y = (£y, s Y ). We define M := 2 z, where z is the largest value contained in any of the 
one-dimensional curves xi,..., x n , y i, • • •, y m , and we set n := 3(£ x + ^ Y ). We construct 

GA™’L( Xl ,..., x n ) := M K xi M K x 2 M K ... M K x n M K , 

GA”’ tx (yi, ...,y m ) := M K Vl M K y 2 M K ... M K y m M\ 

where is to be understood as a sequence with k times the entry M. 
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Lemma 6.3. Definition \6.S\ realizes an unbalanced alignment gadget for dynamic time warping. 
Thus, Theorem 


3.3 


is applicable, implying a lower bound of 0((nm) l ~ £ ) for DTW on one¬ 
dimensional curves over No- To restrict the alphabet further, note that our basic values use the 
alphabet {0,1} C Nq and each invocation of the alignment gadget introduces a new symbol which 


is twice as large as the largest value seen so far. Since in the proof of Theorem 3.3 we use alignment 
gadgets three times, we introduce the symbols 2, 4, and 8. In total, we prove quadratic-time hardness 
of DTW on one-dimensional curves taking values in {0,1,2,4, 8} C No- This proves Theorems 0 
and 11.31 


Proof of Lemma \6.3\ Observe that x := GAx’ iv (ri,...,Xn) and y := GA Y ’ tx (yi ,..., y m ) can be 
computed in time 0((n + m)(£ x + £ Y )) yielding strings of length 0(n(£ x + £ Y )) and 0(m(£ x + ^y)), 
respectively. Moreover, type(x) and type(y) only depend on t x ,t Y ,n,m. It remains to show the 
inequalities ([2]) of Definition 3.1 for which we set C := (n — m)(£ x M — s x ). 

We start with the following useful observations. 

Claim 6.4. Let £ > 1 and a,a',b,b' G No- For any i G [n],j G [m], we have 

(1) <5dtw > <5dtw (%i,M) = £ X M — s x > £ x M/2 and 5r>T\/v{M e , yj) > <5dtw (M,yj) = 
£ y M — s Y > £ y M/2 , 


(2) 5vTift(xi,yj) < (£ x + £ y )M/2. 

(3) 6dtw{x',M k ) > kM/2 and (5 dtw(A^ k > y') > kM/2 for any substrings x' of Xi and y' of yj, 

(4) SvTw{M a XiM a , M b yjM b ) > Smw(xi,yj). 

Proof. For (1), observe that each symbol of Xi can only be traversed together with the symbol M 
and hence, 


C 4 

^dtw (xi,M e ) > h DTW (xj,M) = | M - Xi[k\ \ = £ X M - y ^Xj[k\ = £ X M - s x . 

k= 1 k =1 

Since Xi[k\ < z = M/2, we have s x < £ x M/2. The statement for yj is symmetric. 

For (2) and (3), note that all symbols in x' are in [0 ,z\. Hence, we obtain <5 dtw(*4> Vj) < 
max{|xj|, \yj\} ■ z < (£ x + £ Y )M/2. Likewise, 5dtw( x ' , M K ) > n(M — z) = nM/2. The inequality 
for yj follows symmetrically. 

To prove (4), consider an optimal traversal T of M a XiM a and M b yjM b . We construct a 
traversal T' of x t and ]jj that has no larger cost. If T does not already traverse Xi[ 1 ] together with 
yj[ 1 ], then at some step in T either a symbol in Xi is traversed together with a symbol of the prefix 
M b or a symbol in yj is traversed together with a symbol of the prefix M a . Let us assume the first 
case, since the second is symmetric. A contiguous part T H of T consists of traversing a prefix x 1 of 
Xi together with all symbols in M b , incurring a cost of at least \x'\M/2. Let T B be the remaining 
part of T after T H . We construct a traversal T" of XiM a and yjM b as follows. We first traverse x' 
together with y 3 [1] and then follow T R , which is possible since T R starts at ijj [ 1 ] . Since traversing 
x' together with yj[ 1 ] incurs a cost of at most | x'\z = \x'\M/2, which is smaller than the cost of T H , 
the cost of our constructed traversal T" is no larger than the cost of T. Symmetrically, we eliminate 
the suffixes M a ' and M b ' and construct a traversal T' of Xi and yj of cost no larger than T. □ 
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Figure 6: Optimal traversal corresponding to structured alignment A = {(A +j,j) \ j E [m]} E S n , m 


We first verify that 

5dtw(®, y) <{n- m)(4M - s x ) + min ^ (5 dtw (xi,Vj), 

-AEo n m , 

by designing a traversal (illustrated in Figure [6]) that achieves this bound. Let A E 5 n ,m be 
the alignment minimizing the expression, and note that A = {(A + 1,1),..., (A + m,m)} for 
some 0 < A < n — m. We first traverse M K x\M K ... M K x a together with the first symbol of 
y, M, which contributes a cost of 5 dtw(®i) M) = A(£ X M — s x ). For i = we 

repeat the following: We traverse Af K XA+i together with M K y 2 by traversing M K and M K si¬ 
multaneously, and Xi and y 2 in a locally optimal manner; this incurs a cost of ^DTwC^A+i^ y%) 
for each i. Finally, we traverse the last block M K in y with the current block M K in x, and 
then traverse the remainder XA+m+iM K ... M K x n M K of x together with the last symbol of y, M. 
The total cost amounts to A(£ X M — s x ) + ^dtw^a+ii Vi) + (n — A — m)(£ x M — s x ) = 

(n - m)(£ x M - s x ) + E(ij)eA <Ww(ah, %)■ 

In the remainder of the proof, we verify that 

5dtw(£, y) >{n- m)(£ x M - s x ) + 

Let T* = ((af,&i),..., be an optimal traversal of (x,y) (see Section |2] for the definition of 

traversals). Substrings x' of x and y' of y are paired if for some index i in x' and some index j in 
y' we have (i,j) = (ap,bp) for some 1 < t' < t. 

We call the i-th occurrence of M K in x the i-th M-block Mf of x, and similarly for y. Let 
X := {Mf | i E [n + 1]}, Y := {Mj \ j E [m + 1]} be the sets of all M-blocks of x and y. 
respectively. We dehne a bipartite graph Gm with vertex set lUL, where M-blocks Mf and Mj 
are connected by an edge if and only if they are paired. We show the following properties of Gm- 

Claim 6.5 (Planarity). For any paired Mf,Mj and paired Mf ,, Mj) we have i <i! and j < j' (or 
i > i' and j > j'). 

Proof. By monotonic.ity of traversals, for k < k! we have a* k < at, and bt < b* k ,. Thus, if x[aff\ is in 
Mf and x[a* k ,\ is in Mf. then i < i!. Similarly, if y[b* k \ is in Mj and y[b* k ,\ is in Mj,, then j < j’. 
Hence, for any paired Mf,Mj and Mf,Mj, we have i < if j < j' or i > if j > j'. □ 

Claim 6.6. Gm has no isolated vertices. 


mm 

A.£z~An,m 


> (5dtw (Xi,yj) + (rn- |.A|) max <5 D tw ( x i> Vj ) 

2,7 
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Proof. Assume that some M-block Mf is not paired with any M-block of y, and let i be maximal 
with this property. Note that i < n + 1, as the last M-block of x is always paired with the last 
M-block of y. Then there is some j £ [m] such that Mf is paired with yj, but Mf is not paired 
with any part of y outside yj. By maximality of i and planarity, Mj +1 is paired with x t or Mf +1 , 
as otherwise Mf + L is not paired with any Mj,. 

We can find a cheaper traversal as follows. Consider the first time t\ at which the traversal T* 
is simultaneously at the first symbol of Mf and any symbol of yj (this exists since Mf is paired 
to yj , but to no part of y outside yj ), and any time t 2 at which T* is at Mj +1 and Xi or at Mj +1 
Between t\ and t- 2 , T* has a cost of at least <5 dtw(?/, M k ), where y' is any substring 
(3), this is at least kM/2. We replace this part of T* by traversing (i) the 


6.4 


and M| +1 . 
of yj. By Claim 

remainder of yj with the first symbol of Mf , (ii) Mf with the necessary part of Mj +1 , and (iii) the 

(1), this incurs a 


6.4 


necessary part of Xi and Mf +1 with the current symbol in y, M. By Claim 
cost of at most <5DTw4i, M) + hDTw(l/j, M) = 4M — s x + 4M — s Y < (4 + 4)M. By our choice 
of k = 3(4 + 4), we improve the cost of the traversal, contradicting optimality of T*. This shows 
that no vertex in X is isolated, we argue similarly for vertices in Y. □ 

Claim 6.7. Gm contains no path of length 3. 

Proof. Assume that Gm contains a path Mf — Mj — Mf — Mj,. Without loss of generality we 
assume i < if the case i > i' is symmetric. By planarity, we have j < j' . Since Gm has no isolated 
vertices and by planarity, every Mf, with i < i" < i! is paired with Mj, so we can assume that 
i' = i + 1 (after replacing i with i! — 1). Similarly, we can assume j' = j + 1, and the path is 
Mf — M? — Mf +1 — Mj +1 . 

We can find a cheaper traversal as follows. Consider any time t\ at which the traversal T* 
is simultaneously at Mf and Mj (this exists since Mf and Mj are paired), and consider any 
time t '2 at which T* is simultaneously at Mf +1 and Mj +1 . Between t\ and t- 2 , T* traverses Xi 

(1) incurs a cost of at least 
of T* by traversing (i) the 


6.4 


with (parts of) Mj, and yj with (parts of) Mf +l , which by Claim 
i5dtw (%i,M) + 5DTw(M,yj) > (4 + £ y )M/2. We replace this part 

remaining parts of Mf and Mj, (ii) Xi and yj (in a locally optimal way), and (iii) the necessary 
parts of Mf +1 and Mj +1 . This incurs a cost of <5dtw(3A 1/?) < (4 + 4)M/2 (by Claim 
which contradicts optimality of T*. 


6.4 


(4)), 

□ 


By the above two claims, Gm is a disjoint union of stars. By planarity and since Gm has no 
isolated vertices, the leafs of any star in Gm have to be consecutive M-blocks. Hence, we can 
write the components of Gm as C \,..., C s with Cf = {Mf } U {Mj , Mj +1 ,..., Mj +d _ 1 }, and 
Ci. ..., c;, with C' k = {Art} U {Ml, Mf, v .... MS +d , A. .' 


Claim 6.8. We have J2k= l dk = m — s' + 1 and ^fc=i d' k = n — s + 1. 


Proof. Since the components C \,..., C s and C[,, Cf partition Gm, restricted to Y we have s' + 
Ylk =l ^k = ELi |4fcCT|+X]fc=i \CkCY\ = |T| = m+l. The second claim follows analogously. □ 


We construct an alignment by aligning the X {, y 3 that he between two consecutive components 
of Gm- More formally, we define an alignment A by aligning (i^ — 1 ,j k — 1) (for all k £ [s] with 
i k ,j k > 1) an< 4 aligning ( i' k — 1 ,j’ k — 1) (for all k £ [s'] with i' k ,j' k > 1). Since Gm has no isolated 
vertices, A is a valid alignment. We have |A| = s + s' — 1, since only the leftmost component of Gm 
has ik = 1, jk = 1, i' k = 1, or j k = 1, and all other components give rise to exactly one aligned pair. 
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Let us calculate the cost of T*. Each y 3 that lies between the leafs of a star Cfc in Gm (he., 
jk < j < jk + dk) has to be traversed together with (parts of) Mf. By Claim 6.4 (1), this incurs a 
cost of at least 4dtw( iff. yfl ) = £ Y M — s Y ■ Likewise, each Xi that lies between the leafs of a star C' k 
incurs a cost of at least £ X M — s x . For any (i, j) £ A, Xi is traversed together with a substring of 
M K yjM K , and yj is traversed together with a substring of M K XjM K . He nce, there are a, a',b,b r > 0 


such that we traverse M a XiM a ' together with M b yjM b ' . By Claim 
least SDT~w(xi,yj). In total, the cost of the optimal traversal T* is 


6.4 


(4), this incurs a cost of at 


^DTw(x,y) >'^(dk - l)(£ Y M - s Y )+ '^(d , k - 1)(£ X M - s x ) + ^ <5dtw {xi,yj). 

(*J')6 A 


k =1 


k=1 


By Claim 6.8 we have Ylk=i(dk ~ 1) = m — (s + s' — 1) = m — \A\. Similarly, ^fc=i (d' k — 1) = 


n — \A\ = (n — m) + (m — |A|). Additionally bounding £ V M — s Y + £ x M — s x > (£ x + £ Y )M/2 > 
maxjj- 4 dtw( 2; *; Vj)- we obtain the desired inequality 


<5dtw(®, y) > (m - |A|) max(5 DT w(si, yj) + (n - m)(( x M - s x ) + <5 D tw (xi,yj). □ 

i. n ■ ^ 

(iJ)eA 
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7 Palindromic and Tandem Subsequences 


In this section, we prove quadratic-time hardness of longest palindromic subsequence (LPS) and 
longest tandem subsequence (LTS) by presenting reductions from LCS. This proves Theorem |1.5| 
We will use the following simple facts about LCS, where we regard LCS as a minimization problem 
by defining 5^cs( x ,y) := \ x \ + |y| — 2|LCS(x,y)|. In the whole section we let E be any alphabet 
with 0,1 6 E. 


Fact 7.1. Let z,w be binary strings and £. k 6 No- Then we have (1) <5LCs(l fc Z) 1 k w) = 

(2) dLCs(l fc ^> > ^lcs(2, w) — k and (3) 5lcs(0^z, l k w) > min{fc, <5 lcs(^) +£}■ We obtain 

symmetric statements by flipping all bits and by reversing all involved strings. 


Proof. (1) is a restatement of Claim 4.5 (1). (2) follows from Fact 5.5 (2). For (3), fix a LCS s of 
(C rz, l k w). If s starts with a 0, then it does not contain the leading l k of the second argument, 
leaving at least k symbols unmatched, so that <5lcs(0^z, l k w) > k. Otherwise, if s starts with a 1, 
then it does not contain the leading 0^ of the first argument, so that |LCS(0^z, l fc rc)| = |LCS(z, l i: u>)|. 
Then we have <5 lcs(0^z, 1 k w) = \0 e z\ + \l k w\ — 2|LCS(0 £ 2, l fc ui)| = £+ \z\ + |l fc rc| — 2|LCS(2;, l fc tc)| = 
£ + 5 LCS (z,l k w). □ 


7.1 Longest Palindromic Subsequence 

We show that computing the length of the longest palindromic subsequence is essentially computa¬ 
tionally equivalent to computing the length of the longest common subsequence of two strings. For 
completeness, we provide the following well known result which shows that LPS can be reduced to 
LCS in linear time. Recall that for a string x we denote the reversed string by rev(x). 

Fact 7.2 (Folklore). For any input x £ E* ; we have |LPS(x)| = |LCS(x, rev(x))|. 
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Proof. Let p be a palindromic subsequence of x. Then p = r ev(p) is a common subsequence of x 
and rev(x), yielding |LCS(x,rev(x))| > |LPS(x)|. 

For the other direction, let c be any LCS of x and rev(x') of length £. It remains to show that we 
can find a palindromic subsequence p of x with \p\ > £ (observe that c itself is not necessarily a palin¬ 
drome). Note that c gives rise to a sequence of pairs (ai, iq), ..., (at, be) such that a\ <■■■< at, 
b\ > ■ ■ ■ > be, and c = (x[ai],... ,x[at\) = (x[bi],... ,x[be\). Define m := [fj + 1. If a m < b m , then 
a\ < ■ ■ ■ < a m < b m < ■ ■ ■ < b\ and hence (x[ai],..., x[a m _i], x[a m ],x[b m - 1 ],..., x[6i]) is a palin¬ 
dromic subsequence of x of length 2m— 1 = 2[|j +1 > l. Otherwise, i.e., if a m > b rn . then be <■■■ < 
b m < 0 L m <■••< at gives rise to the palindromic subsequence (x[bt \,..., x[b m \, x[a m \,..., x[ae\) of 
x with length 2{l — m + 1) = 21 — 2|_|j > i. □ 


To prove our lower bound for computing a longest palindromic subsequence of a string x, we 
present a simple reduction from LCS to LPS, and then appeal to our lower bound for LCS, which 


is equivalent to Edit(l, 1,0,2), see Theorem 1.2 


Theorem 7.3. On input x,y £ X*, we can compute, in time 0(\x\ + |y|), a string z € X* and 
N such that |LPS(z)| = 3k + 2|LCS(x, y) |. 


Proof. Let k := 2(£ x + £ Y + 1), where £ x := |x|, £ Y := \y\. We define 


2 := x 0 K 1 K 0 K rev(y). 


Clearly, z and k can be computed in time 0(£ x -\-£ Y ). Let s be a LCS of x and y. Then s0 K l K 0 K rev(s) 
is a palindromic subsequence of z, which proves |LPS(z)| > 3 k + 2|LCS(x, y)\. 

To show |LPS (z )| < 3/t+2|LCS(x, y)\, fix a LPS p of z and let £ be its length. We define m := [fj 
and denote by p± = (p[l],... ,p[m\) the first “half” of p. Let z\ = (z[l],..., z[i]) be the shortest 
prehx of z that contains p\ as a subsequence and let Z 2 '■= ( z[i + 1],..., z[|z|]) be the remainder of 
z. Then pi, which by definition equals (p [£],... ,p[£ — m + 1]), is a subsequence of rev ( 2 : 2 ). This 
shows that if £ is even, then £ < 2|LCS(zi, rev(z 2 ))|. If £ is odd, we may without loss of generality 
assume that p[m + 1] = Z 2 [ 1]. Hence rev(pi) is a subsequence of z' 2 := (^ 2 [2],..., ^ 2 [|^ 2 1]), so that 
£ < 2|LCS(zi,rev(^ 2 ))| + 1. It remains to show that (i) |LCS(zi,rev(^2))| < §« + |LCS(x,y)| and 
(ii) |LCS(2i,rev(22))| < + |LCS(x, y)\ - \. 

Assume that |zi| < £ x +k or |^ 2 1 < (£ y + 1)+k, then by |LCS(x, y)\ < min{|x|, |y|} we obtain that 
|LCS(^i,rev(^ 2 ))| < |LCS(zi,rev(z 2 ))| < max{4,4 + 1} + k < | k + |LCS(x,y)|. Hence without 
loss of generality, Z\ = xO K l a and Z 2 = 1“ 0 K rev(y) with a' > 1, where we assume that a' > a since 
the other case is symmetric. Note that (i) and (ii) are equivalent to <5 lcs(zi, rev( 22 )) > <5lcs (x,y) 
and 4cs(^i,r ev (4)) > <^LCS (x,y), respectively. We compute 

^LCS^ijrev^)) = <JLCs(®0 K l a ,yO K l a ') 

= 5 LC s(x0 K ,y0 K l a '- a ) (byFact[7T](l)) 

> min{K, 5 LC s(^0 K ,yO K )} (by Fact[7T|(3)) 

= mm{K,d L cs(x,y)} = S LC s(x,y). (by Fact |7T|(1)). 

By replacing a' by a’ — 1 > 0, we obtain 5 lcs(^1i rev^)) > ^LCS (x,y) by the same calculation. 
This yields |LPS(z)| = £ < 3k + 2|LCS(x, y)\, as desired. 

□ 
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7.2 Longest Tandem Subsequence 

As for LPS, our lower bound for LTS follows from a simple reduction from LCS and appealing to 
our lower bound for LCS of Theorem 11.21 

Theorem 7.4. On input x,y E X*, we can compute, in time 0(\x\ + |y|), a string z E X* and 
such that |LTS(z)| = 4k + 2|LCS(x, y)\. 

Proof. Let k := £ x + £ y , where £ x := \x\ and t Y '■= \y\- We dehne 

z := 0 K x 1 K 0 K y T\ 


Clearly, z can be computed in time 0(£ x + £ Y ). Let s be a LCS of x and y. Then t := t 1 1 1 with 
t' := 0 K sl K is a tandem subsequence of z. Hence, we have |LTS(z)| > |t| = 4 k + 2|LCS(x,y)|. 

To show |LTS(z)| < 4 k + 2|LCS(x, y)|, hx a LTS t = t! t' of 2 . Let i be the smallest index such 
that t! is a subsequence of Z\ := (z[ 1],..., z[i]) and let Z 2 := (z\i + 1],..., z[|z|]). By choice of t, t' 
is also a subsequence of Z 2 , so that |LTS(z)| = 2|f'| < 2|LCS(zi, z 2 )|. Thus, it remains to prove that 
2|LCS(zi,z 2 )| < 4k + 2|LCS(x,y)|. 

Assume that |zi| < n + £ x or |z 2 | < k + l Y . Then, using |LCS(x, y)\ < min{|x|, |y|}, we conclude 
that 2|LCS(zi, z 2 )| < 2 k + 2(£ x + l Y ) <4k + 2|LCS(x, y)\. 

Hence, without loss of generality, we have (i) z\ = 0 K xl^ and Z 2 = l f 0 K yl K or (ii) z\ = 0 K xl K 0^ 
and Z 2 = y 1 K , for some £,£' with £+£' = k. We only consider case (i), since case (ii) is symmetric. 
Note that 2|LCS(zi, z 2 )| < 4k + 2|LCS(x, y)\ is equivalent to <5 lcs("1>£2) > ^lcs (x,y). We obtain 


5 L cs(z 1 ,Z2) = 5 LC s(0 K xl ( ,l £ '0 K yl K ) 

> min{K, <5 LC s(0 K xl £ , 0 K yl K ) + £'} 
= min{K, 5 lcs(^1^) V^) + £’} 

= min{K,<5 L csOc2/l K ^) + £'} 

> min{K, <5lcs(^, y) - (k - £) + £'} 
= min{K,5 L cs{x,y)} = Shcs(x,y), 


(by Fact [7T|(3)) 
(by Fact |7T|(1)) 
(by Fact |7T|(1)) 
(by Fact [7T|(2)) 


which proves the desired inequality 2|LCS(zi, z 2 )| < 4k + 2|LCS(x, y )|. 


□ 


8 Conclusion 

We prove conditional lower bounds for natural polynomial-time problems: Edit distance for general 
operation costs, including its special case longest common subsequence, dynamic time warping, 
longest palindromic subsequence, and longest tandem subsequence. Our results give strong evidence 
that the known algorithms for these problems are optimal up to lower order factors, even restricted 
to binary strings and one-dimensional curves, respectively. We hope that the underlying framework 
will find application in hardness proofs for further similarity measures, and that the studied problems 
serve as starting points for further reductions. 

It remains an open question whether constant-factor approximations running in strongly sub¬ 
quadratic time can be ruled out for the above problems assuming SETH. Furthermore, most 
polynomial-time lower bounds show quadratic-time barriers, and it is challenging to prove match¬ 
ing SETH-based lower bounds for problems with, say, cubic or 0(ra 3//2 )-time algorithms (only few 
results are known in this direction BED- 
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